theislab / zellkonverter

Conversion between scRNA-seq objects
https://theislab.github.io/zellkonverter/
Other
144 stars 27 forks source link

Incompatibility with R {anndata} #75

Closed lazappi closed 1 year ago

lazappi commented 1 year ago

As discovered in #74 {zellkonverter} is currently incompatible with the R {anndata} package.

Reproducible example

library(zellkonverter)
#> Warning: package 'zellkonverter' was built under R version 4.2.1

# This works fine before loading {anndata}
file <- system.file("extdata", "krumsiek11.h5ad", package = "zellkonverter")
sce <- readH5AD(file)
#> Warning: The names of these selected uns$highlights items have been modified
#> to match R conventions: '0' -> 'X0', '159' -> 'X159', '319' -> 'X319', '459' ->
#> 'X459', '619' -> 'X619'

# After loading {anndata} it fails
library(anndata)
sce <- readH5AD(file)
#> Error in adata$uns$keys(): attempt to apply non-function

Created on 2022-10-17 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.0 (2022-04-22) #> os macOS Catalina 10.15.7 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Berlin #> date 2022-10-17 #> pandoc 2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> anndata * 0.7.5.5 2022-09-23 [1] CRAN (R 4.2.0) #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0) #> basilisk 1.9.11 2022-09-19 [1] Bioconductor #> basilisk.utils 1.9.4 2022-09-19 [1] Bioconductor #> Biobase 2.57.1 2022-05-19 [1] Bioconductor #> BiocGenerics 0.43.4 2022-09-11 [1] Bioconductor #> bitops 1.0-7 2021-04-24 [1] CRAN (R 4.2.0) #> cli 3.4.1 2022-09-23 [1] CRAN (R 4.2.0) #> DelayedArray 0.23.2 2022-09-15 [1] Bioconductor #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0) #> dir.expiry 1.5.1 2022-09-05 [1] Bioconductor #> evaluate 0.17 2022-10-07 [1] CRAN (R 4.2.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0) #> filelock 1.0.2 2018-10-05 [1] CRAN (R 4.2.0) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0) #> GenomeInfoDb 1.33.10 2022-10-14 [1] Bioconductor #> GenomeInfoDbData 1.2.9 2022-09-29 [1] Bioconductor #> GenomicRanges 1.49.1 2022-08-18 [1] Bioconductor #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) #> here 1.0.1 2020-12-13 [1] CRAN (R 4.2.0) #> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0) #> htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.2.0) #> IRanges 2.31.2 2022-08-18 [1] Bioconductor #> jsonlite 1.8.2 2022-10-02 [1] CRAN (R 4.2.0) #> knitr 1.40 2022-08-24 [1] CRAN (R 4.2.0) #> lattice 0.20-45 2021-09-22 [2] CRAN (R 4.2.0) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) #> Matrix 1.5-1 2022-09-13 [1] CRAN (R 4.2.0) #> MatrixGenerics 1.9.1 2022-06-28 [1] Bioconductor #> matrixStats 0.62.0 2022-04-19 [1] CRAN (R 4.2.0) #> png 0.1-7 2013-12-03 [1] CRAN (R 4.2.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0) #> Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.2.0) #> RCurl 1.98-1.9 2022-10-03 [1] CRAN (R 4.2.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.0) #> reticulate 1.26 2022-08-31 [1] CRAN (R 4.2.0) #> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.0) #> rmarkdown 2.17 2022-10-07 [1] CRAN (R 4.2.0) #> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.0) #> S4Vectors 0.35.4 2022-09-18 [1] Bioconductor #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0) #> SingleCellExperiment 1.19.1 2022-09-30 [1] Bioconductor #> stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.0) #> stringr 1.4.1 2022-08-20 [1] CRAN (R 4.2.0) #> SummarizedExperiment 1.27.3 2022-09-13 [1] Bioconductor #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) #> xfun 0.33 2022-09-12 [1] CRAN (R 4.2.0) #> XVector 0.37.1 2022-08-25 [1] Bioconductor #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0) #> zellkonverter * 1.7.7 2022-10-04 [1] Bioconductor #> zlibbioc 1.43.0 2022-05-05 [1] Bioconductor #> #> [1] /Users/luke.zappia/Library/R/x86_64/4.2/library/__bioc-devel #> [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library #> #> ─ Python configuration ─────────────────────────────────────────────────────── #> python: /Users/luke.zappia/Library/Caches/org.R-project.R/R/basilisk/1.9.11/zellkonverter/1.7.7/zellkonverterAnnDataEnv-0.8.0/bin/python #> libpython: /Users/luke.zappia/Library/Caches/org.R-project.R/R/basilisk/1.9.11/zellkonverter/1.7.7/zellkonverterAnnDataEnv-0.8.0/lib/libpython3.8.dylib #> pythonhome: /Users/luke.zappia/Library/Caches/org.R-project.R/R/basilisk/1.9.11/zellkonverter/1.7.7/zellkonverterAnnDataEnv-0.8.0:/Users/luke.zappia/Library/Caches/org.R-project.R/R/basilisk/1.9.11/zellkonverter/1.7.7/zellkonverterAnnDataEnv-0.8.0 #> version: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:05:47) [Clang 12.0.1 ] #> numpy: /Users/luke.zappia/Library/Caches/org.R-project.R/R/basilisk/1.9.11/zellkonverter/1.7.7/zellkonverterAnnDataEnv-0.8.0/lib/python3.8/site-packages/numpy #> numpy_version: 1.22.3 #> #> NOTE: Python version was forced by use_python function #> #> ────────────────────────────────────────────────────────────────────────────── ```

Details

After loading {anndata} an AnnData object is no longer a raw Python object, instead it is wrapped by the AnnDataR6 class. This has slightly different method names which causes the issue.

Possible solutions

  1. Check if {anndata} is loaded using if ("anndata" in (.packages())) and issue a warning about compatibility
  2. Check the class of the object and use different methods depending on if it is a Python object or AnnDataR6
lazappi commented 1 year ago

@rcannood Do you have any other insights on this or suggestions for other solutions?

rcannood commented 1 year ago

I see! This is indeed a problem :) The issue is that reticulate has a set of py_to_r and r_to_py converters. Since anndata the R package adds (amongst others) py_to_r.anndata._core.anndata.AnnData, running anndata <- import("anndata"); anndata$read_h5ad(...) causes the Python AnnData object to be converted into an AnnDataR6.

I'm working on a few alternative solutions, one of which might be the most desirable. Give me a minute ... :)

rcannood commented 1 year ago
  1. A clean but tedious solution would be to use anndata <- import("anndata", convert = FALSE) instead of anndata <- import("anndata"). However, whenever you'd want to make use of a slot in the Python AnnData object you'd have to call py_to_r(...) manually, e.g. py_to_r(adata$X).

  2. Switch to using AnnDataR6 to convert between AnnData and SCE. This would cause zellkonverter to depend anndata (R package), which might not be desirable at this stage.

  3. Calling r_to_py_ifneedbe(anndata$read_h5ad(...) instead of anndata$read_h5ad(...) with r_to_py_ifneedbe defined as:

r_to_py_ifneedbe <- function(x, convert = TRUE) {
  if (inherits(x, "python.builtin.object")) {
    x
  } else {
    r_to_py(x, convert = convert)
  }
}

I'm doing a few experiments to see if approach 5 indeed works without many changes to zellkonverter.

rcannood commented 1 year ago

Hmm... Simply converting an AnnDataR6 to a anndata._core.anndata.AnnData (as in https://github.com/theislab/zellkonverter/compare/master...rcannood:zellkonverter:fix/compat_anndatar6?expand=1 ) doesn't work straight out of the box because anndata (the R package) also adds converters for other Python objects:

S3method(py_to_r,anndata._core.aligned_mapping.LayersBase)
S3method(py_to_r,anndata._core.anndata.AnnData)
S3method(py_to_r,anndata._core.raw.Raw)
S3method(py_to_r,anndata._core.sparse_dataset.SparseDataset)
S3method(py_to_r,collections.abc.KeysView)
S3method(py_to_r,collections.abc.Mapping)
S3method(py_to_r,collections.abc.Set)
S3method(py_to_r,h5py._hl.dataset.Dataset)
S3method(py_to_r,pandas.core.indexes.base.Index)

Even if an AnnDataR6 object is converted back into an anndata._core.anndata.AnnData object, downstream code still errors because of these implicit conversions. For instance:

    meta_list <- .convert_anndata_slot(
        adata, "uns", py_builtins$list(adata$uns$keys()), "metadata",
        select = uns
    )

In here, running adata$uns$keys() errors because adata$uns gets converted to a named list. We could substitute the code above for something like:

    uns_val <- adata$uns
    uns_keys <- if (is.list(uns_val)) names(uns_val) else py_builtins$list(uns_val$keys())
    meta_list <- .convert_anndata_slot(
        adata, "uns", uns_keys, "metadata",
        select = uns
    )

However this would require going through the code and making sure that assumed python-objects are not automatically converted by reticulate. On the other hand, if reticulate ever decides to add default implementations for things like collections.abc.Container the same exercise would need to be performed (Similar to issue 1226 in https://github.com/rstudio/reticulate/issues)

lazappi commented 1 year ago

Thanks for the detailed response! I'm thinking maybe option 3 is sounding like the best solution at this stage? It's maybe the most work but should help avoid similar issues in the future and maybe has some other advantages. Can you think of any downsides to this (other than it being a bit tedious)?

rcannood commented 1 year ago

I see! I agree option 3 would be a lot cleaner. One think I'm wondering about is whether to set convert to TRUE or FALSE in SCE2AnnData since that object can be used by end users.

I think setting convert = FALSE could break downstream users' scripts because they might try to read something from the AnnData (though I think that scenario is probably unlikely?).

Setting convert = TRUE would result in the same problem as in #74 if users expect the AnnData object to return python objects in certain instances whereas {anndata} will convert those objects to R data structures on purpose.

rcannood commented 1 year ago

Agh! Setting convert = TRUE in SCE2AnnData(...) breaks the symmetry with AnnData2SCE(...). That is, running AnnData2SCE(SCE2AnnData(...)) doesn't work.

Perhaps it makes more sense to add convert as a parameter to SCE2AnnData and let the user choose?

Fyi, I've been experimenting with creating the necessary changes for options 3 here.

lazappi commented 1 year ago

I think I need to play around a bit to understand this a bit better but I think adding a convert argument is probably the most flexible option (maybe with convert=TRUE as the default because that's the current behaviour?). Have you tested what effect convert=TRUE/FALSE makes on interacting with the object you get from SCE2AnnData()? So if I did adata$write_h5ad() does that work either way?

rcannood commented 1 year ago

Indeed, setters or functions without any clear return value should work regardless.

However, I think I stumbled upon the 'correct' way to solve this problem. Namely, predefined conversion functions (e.g. py_to_r.collections.OrderedDict) use disable_conversion_scope(...) to ensure that fields in a Python object (e.g. the Python AnnData) are not converted to R within a certain scope.

I updated my branch so that, at the beginning of AnnData2SCE, the disable_conversion_scope function is used ( See https://github.com/theislab/zellkonverter/compare/master...rcannood:zellkonverter:fix/add_py_to_r?expand=1#diff-4c7ea0a920b659d712a5256b98d7e4072805ee394c295eb01c6ce7c990c37e8cR106-R112 ) and then py_to_r(...) is used to explicitly convert everything to R objects where needed.

lazappi commented 1 year ago

Ah, we should have thought to check if {reticulate} had any tricks we could steal 😸! So with this, we don't need any convert argument, it's just internal changes, right?

If you are happy that this works (at least enough for me to so some more testing) I would definitely welcome a PR from your branch.