theislab / zellkonverter

Conversion between scRNA-seq objects
https://theislab.github.io/zellkonverter/
Other
144 stars 27 forks source link

`readH5AD(..., reader="R")` fails with recent AnnData formats? #78

Closed mtmorgan closed 1 year ago

mtmorgan commented 1 year ago

cellxgene provides H5AD files for each data set. A recent download of this one (sorry, there is no direct url; click on the cloud download button) has content like (from rhdf5::h5ls())

10                                           /obs                   assay_ontology_term_id   H5I_GROUP
11                    /obs/assay_ontology_term_id                               categories H5I_DATASET  STRING         1
12                    /obs/assay_ontology_term_id                                    codes H5I_DATASET INTEGER     46500

whereas older downloads have

5                     /obs                       __categories   H5I_GROUP
6        /obs/__categories                              assay H5I_DATASET  STRING         1
7        /obs/__categories             assay_ontology_term_id H5I_DATASET  STRING         1
8        /obs/__categories                   author_cell_type H5I_DATASET  STRING        30
9        /obs/__categories                          cell_type H5I_DATASET  STRING        28

I guess??? this is a change in AnnData on-disk representation? This causes h5ad <- readH5AD(local_file, reader = "R", use_hdf5 = TRUE) to fail (an error is translated to a warning; the net result is that no colData is added to the SummarizedExperiment.

Warning message:
In value[[3L]](cond) : setting 'colData' failed for
  '/Users/ma38727/Library/Caches/org.R-project.R/R/cellxgenedp/f69ba4b3-fc45-483c-8a7c-434fd056aeed.H5AD':
  cannot coerce class "list" to a DataFrame

Will the R-based reader be updated, or is the best strategy to switch to the python reader?

lazappi commented 1 year ago

I haven't looked into it but I'm guessing this file uses the AnnData v0.8 format. At the moment the safest /most reliable approach is to use the Python reader. The R reader is currently neglected and needs a fair bit of work but that won't happen before the next release.