Closed giovp closed 3 years ago
Nope, you're not doing anything wrong. I just tried and got the same message 😸.
The problem is that two of the metadata columns are nested lists which apparently breaks things. It works if you exclude those columns:
> sce <- SingleCellExperiment(list(counts=counts),
+ colData=meta[, 1:12],
+ )
> writeH5AD(sce, "mouse_gastro.h5ad")
Note: using the 'counts' assay as the X matrix
/Users/luke.zappia/Library/Caches/basilisk/1.2.0/zellkonverter-1.0.0/anndata_env/lib/python3.7/site-packages/anndata/_core/anndata.py:1192: FutureWarning: is_categorical is deprecated and will be removed in a future version. Use is_categorical_dtype instead
if is_string_dtype(df[key]) and not is_categorical(df[key])
... storing 'embryo' as categorical
... storing 'pos' as categorical
If you need that info I would suggest storing it in metadata(sce)
before saving to disk. That should put it in adata.uns
. I'm not sure if this nested structure is possible in pandas but we should probably handle it better, at least with a more useful error message.
good point, I ' ll remove it for now and thanks for explanation! feel free to close if you think so
@lazappi should we implement some tryCatch
blocks around some of the conversions? Or maybe check for wacky columns before we attempt to pass them into Python? Can't remember whether we're already doing this already.
There's already something along these lines for stuff in metadata
but I don't think there is for rowData
/colData
. We should probably have something but haven't thought about whether it is better to check for specific things (like weird columns) in R or just try to convert and fail in a nicer way. Second is more general but I think it might be difficult to identify what the exact problem is for the user. Possibly we need some combination, catch the obvious things in R and failure better if there is something we haven't thought of?
I haven't checked yet but not entirely sure whether this particular issue is in the conversion or writing the .h5ad
file.
Just ran into this issue again with a list column in rowData
which took me a frustrating amount of time to work out.
Annoying thing is that how it fails seem to change depending on the content of the column. Sometimes it works but the conversion is a bit messed up (and takes forever for any reasonable size dataset) and other times you get one of a variety of errors (from both the R and Python side).
Think the safest thing is just to skip any list (or non-vector) columns. Possibly they could be stashed in metadata
and let the checks there decide if they can be converted at all.
with this dataset https://marionilab.cruk.cam.ac.uk/SpatialMouseAtlas/
am I doing something wrong?
sessioninfo
```R > sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] zellkonverter_0.99.5 SingleCellExperiment_1.11.8 SummarizedExperiment_1.19.9 Biobase_2.49.1 GenomicRanges_1.41.6 GenomeInfoDb_1.25.11 [7] IRanges_2.23.10 S4Vectors_0.27.13 BiocGenerics_0.35.4 MatrixGenerics_1.1.3 matrixStats_0.57.0 loaded via a namespace (and not attached): [1] Rcpp_1.0.5 XVector_0.29.3 magrittr_1.5 rappdirs_0.3.1 zlibbioc_1.35.0 lattice_0.20-41 rlang_0.4.7 [8] chemspiderapi_0.0.2.0003 tools_4.0.2 grid_4.0.2 basilisk_1.1.18 Matrix_1.2-18 GenomeInfoDbData_1.2.4 purrr_0.3.4 [15] basilisk.utils_1.1.11 bitops_1.0-6 RCurl_1.98-1.2 curl_4.3 DelayedArray_0.15.16 compiler_4.0.2 filelock_1.0.2 [22] reticulate_1.16 jsonlite_1.7.1 ```