theislab / zellkonverter

Conversion between scRNA-seq objects
https://theislab.github.io/zellkonverter/
Other
145 stars 27 forks source link

`write_h5ad` is not happy with unnamed lists in the metadata #59

Open LTLA opened 2 years ago

LTLA commented 2 years ago

Not really sure what's going on here, but:

library(SingleCellExperiment)
example(SingleCellExperiment)
metadata(sce)$WHEE <- list(list(X=1, Y=2))

library(zellkonverter)
writeH5AD(sce, "foo.h5ad")
## ℹ Using the 'counts' assay as the X matrix
## Error in py_call_impl(callable, dots$args, dots$keywords) :
##   TypeError: Can't implicitly convert non-string objects to strings
## 
## Above error raised while writing key 'uns/WHEE' of <class 'h5py._hl.files.File'> from /.
## 
## Above error raised while writing key 'uns/WHEE' of <class 'h5py._hl.files.File'> from /.
## 
## Detailed traceback:
##   File "/Users/luna/Library/Caches/org.R-project.R/R/basilisk/1.6.0/zellkonverter/1.4.0/zellkonverterAnnDataEnv/lib/python3.7/site-packages/anndata/_core/anndata.py", line 1911, in write_h5ad
##     as_dense=as_dense,
##   File "/Users/luna/Library/Caches/org.R-project.R/R/basilisk/1.6.0/zellkonverter/1.4.0/zellkonverterAnnDataEnv/lib/python3.7/site-packages/anndata/_io/h5ad.py", line 118, in write_h5ad
##     write_attribute(f, "uns", adata.uns, dataset_kwargs=dataset_kwargs)
##   File "/Users/luna/Library/Caches/org.R-project.R/R/basilisk/1.6.0/zellkonverter/1.4.0/zellkonverterAnnDataEnv/lib/python3.7/functools.py", line 840, in wrapper
##     return dispatch(args[0].__class__)(*args, **kw)
##   File "/Users/luna/Library/Caches/or

Brief investigations indicate that it goes through SCE2AnnData fine but fails in the adata$write_h5ad step. The error goes away if the list is named. I would speculate that an unnamed list in the SCE metadata somehow gets stored inside the AnnData's metadata in the same place as the assay names, leading the Python-side code to believe that they are additional assays, and try to open file handles to save them? Or something. I don't think I wrote that part of the function.

Session information ``` R version 4.1.2 Patched (2022-02-10 r81713) Platform: x86_64-apple-darwin19.6.0 (64-bit) Running under: macOS Catalina 10.15.7 Matrix products: default BLAS: /Users/luna/Software/R/R-4-1-branch/lib/libRblas.dylib LAPACK: /Users/luna/Software/R/R-4-1-branch/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] zellkonverter_1.4.0 SingleCellExperiment_1.16.0 [3] SummarizedExperiment_1.24.0 Biobase_2.54.0 [5] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 [7] IRanges_2.28.0 S4Vectors_0.32.3 [9] BiocGenerics_0.40.0 MatrixGenerics_1.6.0 [11] matrixStats_0.61.0 loaded via a namespace (and not attached): [1] Rcpp_1.0.8 XVector_0.34.0 zlibbioc_1.40.0 [4] here_1.0.1 lattice_0.20-45 tools_4.1.2 [7] parallel_4.1.2 grid_4.1.2 png_0.1-7 [10] cli_3.1.1 basilisk_1.6.0 rprojroot_2.0.2 [13] Matrix_1.4-0 GenomeInfoDbData_1.2.7 dir.expiry_1.2.0 [16] bitops_1.0-7 basilisk.utils_1.6.0 RCurl_1.98-1.6 [19] glue_1.6.1 DelayedArray_0.20.0 compiler_4.1.2 [22] filelock_1.0.2 jsonlite_1.7.3 reticulate_1.24 ```
ivirshup commented 2 years ago

The issue is that AnnData doesn't have a way of encoding lists of non-scalars. It's just not really clear how it fits with the Group or Array model of hdf5 and zarr. Ideas welcome.