Error writing SCE with List columns in colData

giovp commented 3 years ago

library(SingleCellExperiment)
library(zellkonverter)

counts <- readRDS("/Users/giovanni.palla/Datasets/SpatialMouseGastro/counts.rds")
meta <- readRDS("/Users/giovanni.palla/Datasets/SpatialMouseGastro/metadata.rds")

sce <- SingleCellExperiment(list(counts=counts),
                            colData=meta,
)

writeH5AD(sce, file = "/Users/giovanni.palla/Datasets/SpatialMouseGastro/mouse_gastro.h5ad")

Note: using the 'counts' assay as the X matrix
Error in py_set_attr_impl(x, name, value) : 
  Evaluation error: ValueError: Mixing dicts with non-Series may lead to ambiguous ordering..

with this dataset https://marionilab.cruk.cam.ac.uk/SpatialMouseAtlas/

am I doing something wrong?

sessioninfo

```R > sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] zellkonverter_0.99.5 SingleCellExperiment_1.11.8 SummarizedExperiment_1.19.9 Biobase_2.49.1 GenomicRanges_1.41.6 GenomeInfoDb_1.25.11 [7] IRanges_2.23.10 S4Vectors_0.27.13 BiocGenerics_0.35.4 MatrixGenerics_1.1.3 matrixStats_0.57.0 loaded via a namespace (and not attached): [1] Rcpp_1.0.5 XVector_0.29.3 magrittr_1.5 rappdirs_0.3.1 zlibbioc_1.35.0 lattice_0.20-41 rlang_0.4.7 [8] chemspiderapi_0.0.2.0003 tools_4.0.2 grid_4.0.2 basilisk_1.1.18 Matrix_1.2-18 GenomeInfoDbData_1.2.4 purrr_0.3.4 [15] basilisk.utils_1.1.11 bitops_1.0-6 RCurl_1.98-1.2 curl_4.3 DelayedArray_0.15.16 compiler_4.0.2 filelock_1.0.2 [22] reticulate_1.16 jsonlite_1.7.1 ```

lazappi commented 3 years ago

Nope, you're not doing anything wrong. I just tried and got the same message 😸.

The problem is that two of the metadata columns are nested lists which apparently breaks things. It works if you exclude those columns:

> sce <- SingleCellExperiment(list(counts=counts),
+                             colData=meta[, 1:12],
+ )
> writeH5AD(sce, "mouse_gastro.h5ad")
Note: using the 'counts' assay as the X matrix
/Users/luke.zappia/Library/Caches/basilisk/1.2.0/zellkonverter-1.0.0/anndata_env/lib/python3.7/site-packages/anndata/_core/anndata.py:1192: FutureWarning: is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead
  if is_string_dtype(df[key]) and not is_categorical(df[key])
... storing 'embryo' as categorical
... storing 'pos' as categorical

If you need that info I would suggest storing it in metadata(sce) before saving to disk. That should put it in adata.uns. I'm not sure if this nested structure is possible in pandas but we should probably handle it better, at least with a more useful error message.

giovp commented 3 years ago

good point, I ' ll remove it for now and thanks for explanation! feel free to close if you think so

LTLA commented 3 years ago

@lazappi should we implement some tryCatch blocks around some of the conversions? Or maybe check for wacky columns before we attempt to pass them into Python? Can't remember whether we're already doing this already.

lazappi commented 3 years ago

There's already something along these lines for stuff in metadata but I don't think there is for rowData/colData. We should probably have something but haven't thought about whether it is better to check for specific things (like weird columns) in R or just try to convert and fail in a nicer way. Second is more general but I think it might be difficult to identify what the exact problem is for the user. Possibly we need some combination, catch the obvious things in R and failure better if there is something we haven't thought of?

I haven't checked yet but not entirely sure whether this particular issue is in the conversion or writing the .h5ad file.

lazappi commented 3 years ago

Just ran into this issue again with a list column in rowData which took me a frustrating amount of time to work out.

Annoying thing is that how it fails seem to change depending on the content of the column. Sometimes it works but the conversion is a bit messed up (and takes forever for any reasonable size dataset) and other times you get one of a variety of errors (from both the R and Python side).

Think the safest thing is just to skip any list (or non-vector) columns. Possibly they could be stashed in metadata and let the checks there decide if they can be converted at all.

theislab / zellkonverter

Error writing SCE with List columns in colData #26