theislab / zellkonverter

Conversion between scRNA-seq objects
https://theislab.github.io/zellkonverter/
Other
145 stars 27 forks source link

Conversion Error: Conversion from numpy array type 20 is not supported #45

Closed const-ae closed 1 year ago

const-ae commented 3 years ago

Hi Luke,

I tried to load a single cell dataset from https://singlecell.broadinstitute.org/single_cell/study/SCP1052/covid-19-lung-autopsy-samples#study-summary, however I get the following error

> sce2 <- zellkonverter::readH5AD("lung.h5ad")
Error in py_ref_to_r(x) : 
  Conversion from numpy array type 20 is not supported
In addition: Warning message:
In AnnData2SCE(adata, hdf5_backed = backed) :
  the 'W_harmony' item in 'uns' cannot be converted to an R object and has been skipped

in line https://github.com/theislab/zellkonverter/blob/069239ee6ae73d2b2f205681c15056e33de3a982/R/konverter.R#L160

I assume that the problem is the COMPOUND datatype in /varm, however I am not quite sure.

> rhdf5::h5ls("lung.h5ad")
                                   group                            name       otype   dclass         dim
0                                      /                               X   H5I_GROUP                     
1                                     /X                            data H5I_DATASET    FLOAT   138842815
2                                     /X                         indices H5I_DATASET  INTEGER   138842815
3                                     /X                          indptr H5I_DATASET  INTEGER      106793
4                                      /                          layers   H5I_GROUP                     
5                                /layers                          counts   H5I_GROUP                     
6                         /layers/counts                            data H5I_DATASET    FLOAT   138842815
7                         /layers/counts                         indices H5I_DATASET  INTEGER   138842815
8                         /layers/counts                          indptr H5I_DATASET  INTEGER      106793
9                                /layers                      winsorized   H5I_GROUP                     
10                    /layers/winsorized                            data H5I_DATASET    FLOAT   138842815
11                    /layers/winsorized                         indices H5I_DATASET  INTEGER   138842815
12                    /layers/winsorized                          indptr H5I_DATASET  INTEGER      106793
13                                     /                             obs   H5I_GROUP                     
14                                  /obs                         Cluster H5I_DATASET  INTEGER      106792
15                                  /obs                      SubCluster H5I_DATASET  INTEGER      106792
16                                  /obs                          Viral+ H5I_DATASET     ENUM      106792
17                                  /obs                    __categories   H5I_GROUP                     
18                     /obs/__categories                         Cluster H5I_DATASET   STRING          12
19                     /obs/__categories                      SubCluster H5I_DATASET   STRING          39
20                     /obs/__categories                     compartment H5I_DATASET   STRING           6
21                     /obs/__categories                         disease H5I_DATASET   STRING           1
22                     /obs/__categories                           donor H5I_DATASET   STRING          24
23                     /obs/__categories                  leiden_res_1.3 H5I_DATASET   STRING          20
24                     /obs/__categories                    leiden_res_2 H5I_DATASET   STRING          28
25                     /obs/__categories                          method H5I_DATASET   STRING           3
26                     /obs/__categories              predicted_celltype H5I_DATASET   STRING          28
27                                  /obs                      barcodekey H5I_DATASET   STRING      106792
28                                  /obs                     compartment H5I_DATASET  INTEGER      106792
29                                  /obs                         disease H5I_DATASET  INTEGER      106792
30                                  /obs                           donor H5I_DATASET  INTEGER      106792
31                                  /obs                         doublet H5I_DATASET     ENUM      106792
32                                  /obs                  leiden_res_1.3 H5I_DATASET  INTEGER      106792
33                                  /obs                    leiden_res_2 H5I_DATASET  INTEGER      106792
34                                  /obs                          method H5I_DATASET  INTEGER      106792
35                                  /obs                           n_UMI H5I_DATASET  INTEGER      106792
36                                  /obs                         n_genes H5I_DATASET  INTEGER      106792
37                                  /obs                    percent_mito H5I_DATASET    FLOAT      106792
38                                  /obs              predicted_celltype H5I_DATASET  INTEGER      106792
39                                     /                            obsm   H5I_GROUP                     
40                                 /obsm                       X_harmony H5I_DATASET    FLOAT 75 x 106792
41                                 /obsm                           X_pca H5I_DATASET    FLOAT 75 x 106792
42                                 /obsm                          X_umap H5I_DATASET    FLOAT  2 x 106792
43                                 /obsm                  sig_background H5I_DATASET    FLOAT 50 x 106792
44                                     /                             raw   H5I_GROUP                     
45                                  /raw                               X   H5I_GROUP                     
46                                /raw/X                            data H5I_DATASET    FLOAT   138842815
47                                /raw/X                         indices H5I_DATASET  INTEGER   138842815
48                                /raw/X                          indptr H5I_DATASET  INTEGER      106793
49                                  /raw                             var   H5I_GROUP                     
50                              /raw/var                       featureid H5I_DATASET   STRING       30983
51                              /raw/var                      featurekey H5I_DATASET   STRING       30983
52                                     /                             uns   H5I_GROUP                     
53                                  /uns                  Cluster_colors H5I_DATASET   STRING          13
54                                  /uns                 Nuc-Cell_colors H5I_DATASET   STRING           3
55                                  /uns               SubCluster_colors H5I_DATASET   STRING          39
56                                  /uns                       W_harmony   H5I_GROUP                     
57                        /uns/W_harmony                            data H5I_DATASET    FLOAT    16228024
58                        /uns/W_harmony                         indices H5I_DATASET  INTEGER    16228024
59                        /uns/W_harmony                          indptr H5I_DATASET  INTEGER      106793
60                                  /uns                          genome H5I_DATASET   STRING       ( 0 )
61                                  /uns           harmony_knn_distances H5I_DATASET    FLOAT 99 x 106792
62                                  /uns             harmony_knn_indices H5I_DATASET  INTEGER 99 x 106792
63                                  /uns manual_coarse_annotation_colors H5I_DATASET   STRING          14
64                                  /uns                   method_colors H5I_DATASET   STRING           3
65                                  /uns                        modality H5I_DATASET   STRING       ( 0 )
66                                  /uns                       neighbors   H5I_GROUP                     
67                        /uns/neighbors              connectivities_key H5I_DATASET   STRING       ( 0 )
68                        /uns/neighbors                   distances_key H5I_DATASET   STRING       ( 0 )
69                        /uns/neighbors                          params   H5I_GROUP                     
70                 /uns/neighbors/params                          method H5I_DATASET   STRING       ( 0 )
71                 /uns/neighbors/params                          metric H5I_DATASET   STRING       ( 0 )
72                 /uns/neighbors/params                     n_neighbors H5I_DATASET  INTEGER       ( 0 )
73                 /uns/neighbors/params                         use_rep H5I_DATASET   STRING       ( 0 )
74                        /uns/neighbors                       rp_forest   H5I_GROUP                     
75              /uns/neighbors/rp_forest                        children   H5I_GROUP                     
76     /uns/neighbors/rp_forest/children                            data H5I_DATASET  INTEGER  2 x 536989
77     /uns/neighbors/rp_forest/children                           start H5I_DATASET  INTEGER          21
78              /uns/neighbors/rp_forest                     hyperplanes   H5I_GROUP                     
79  /uns/neighbors/rp_forest/hyperplanes                            data H5I_DATASET    FLOAT 75 x 536989
80  /uns/neighbors/rp_forest/hyperplanes                           start H5I_DATASET  INTEGER          21
81              /uns/neighbors/rp_forest                         indices   H5I_GROUP                     
82      /uns/neighbors/rp_forest/indices                            data H5I_DATASET  INTEGER 15 x 268505
83      /uns/neighbors/rp_forest/indices                           start H5I_DATASET  INTEGER          21
84              /uns/neighbors/rp_forest                         offsets   H5I_GROUP                     
85      /uns/neighbors/rp_forest/offsets                            data H5I_DATASET    FLOAT      536989
86      /uns/neighbors/rp_forest/offsets                           start H5I_DATASET  INTEGER          21
87                                  /uns                             pca   H5I_GROUP                     
88                              /uns/pca                          params   H5I_GROUP                     
89                       /uns/pca/params             use_highly_variable H5I_DATASET     ENUM       ( 0 )
90                       /uns/pca/params                     zero_center H5I_DATASET     ENUM       ( 0 )
91                              /uns/pca                        variance H5I_DATASET    FLOAT          75
92                              /uns/pca                  variance_ratio H5I_DATASET    FLOAT          75
93                                  /uns              predictions_colors H5I_DATASET   STRING          28
94                                  /uns                   sample_colors H5I_DATASET   STRING          24
95                                  /uns                            umap   H5I_GROUP                     
96                             /uns/umap                          params   H5I_GROUP                     
97                      /uns/umap/params                               a H5I_DATASET    FLOAT       ( 0 )
98                      /uns/umap/params                               b H5I_DATASET    FLOAT       ( 0 )
99                                     /                             var   H5I_GROUP                     
100                                 /var                       featureid H5I_DATASET   STRING       30983
101                                 /var                      featurekey H5I_DATASET   STRING       30983
102                                    /                            varm   H5I_GROUP                     
103                                /varm                             PCs H5I_DATASET    FLOAT  75 x 30983
104                                /varm                          de_res H5I_DATASET COMPOUND       30983

I could try to provide a reduced file as the original is quite big (1,5GB), but I am not sure what is the best way to reduce the size of a h5ad file.

Best, Constantin

lazappi commented 3 years ago

Hi @const-ae

Thanks for the issue (and giving zellkonverter a go)! It definitely could be whatever is in varm, I should be able to download the dataset from the link and have a look. If you want to make a smaller example that would be great. Easiest way is probably to load it in Python and subset to a smaller number of genes/cells and save a new .h5ad. Don't worry about it if that sounds too much effort though.

lazappi commented 3 years ago

Ok, I have spent some time looking into this. First I can confirm this is an issue (using the smaller B_Plasma.h5ad file from the same link). The culprit is indeed the COMPOUND data stored in adata.varm["de_res"]. When you read this from disk in Python you get a 1-dimensional ndarray of tuples which seems really weird but is the result of using pandas.DataFrame.to_records to create a numpy.recarray (with some information information lost during to trip to disk).

For zellkonverter there are a couple of things that need to be done:

  1. Add checks to the varm conversion so that at least there isn't an error if something fails to convert
  2. Look at handling conversion of these objects (probably worth it as I suspect this will be an issue for anything from the Broad Single Cell Portal). Here are some links about detecting recarray objects and converting them to "normal" numpy arrays.

Should be able to get at least the conversion check into the next release, not so sure about the recarray conversion.

const-ae commented 3 years ago

This is great. Thank you so much for your effort :)

lazappi commented 3 years ago

Ok, I've added checks that should prevent this error in the current dev version (v1.1.8) but without any conversion. I would like to be able to handle the conversion as well but I think that will require rewriting some things so might not happen for this release.

const-ae commented 3 years ago

This is already great. Thank you for the effort :)

lee-t commented 2 years ago

I'm having a similar issue

> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
 [1] reticulate_1.25             SpatialExperiment_1.6.1    
 [3] zellkonverter_1.6.4         SingleCellExperiment_1.18.0
 [5] SummarizedExperiment_1.26.1 Biobase_2.56.0             
 [7] GenomicRanges_1.48.0        GenomeInfoDb_1.32.3        
 [9] IRanges_2.30.0              S4Vectors_0.34.0           
[11] BiocGenerics_0.42.0         MatrixGenerics_1.8.1       
[13] matrixStats_0.62.0          shiny_1.7.2                

Error is similar to OP:

> sce <- zellkonverter::readH5AD(h5ad.file)
Error in py_ref_to_r(x) : 
  Conversion from numpy array type 20 is not supported

I did try rolling back to Zellkonverter 1.6.0 and the error is not present. Is this perhaps an anndata version issue?

lazappi commented 2 years ago

There shouldn't be any major differences between {zellkonverter} v1.6.0 and v1.6.4. Could you please post the output with verbose = TRUE and if possible share the file you are using?

lee-t commented 2 years ago

verbose = TRUE output:

> sce <- zellkonverter::readH5AD(h5ad.file,verbose = TRUE)
ℹ Using anndata version 0.8.0
ℹ Using the Python reader
✔ Read .../.../.../.../.../adata_Tier1_2_3_combined.h5ad [32.1s]
Error in py_ref_to_r(x) : ngleCellExperiment
  Conversion from numpy array type 20 is not supported
✖ Converting AnnData to SingleCellExperiment ... failed

Traceback on error:

Error in py_ref_to_r(x) : 
Conversion from numpy array type 20 is not supported
23.
stop(structure(list(message = "Conversion from numpy array type 20 is not supported", 
call = py_ref_to_r(x), cppstack = NULL), class = c("Rcpp::exception", 
"C++Error", "error", "condition")))
22.
py_ref_to_r(x) 
21.
py_to_r.default(result) 
20.
py_to_r(result) 
19.
python_builtins$dict(x) 
18.
py_to_r.collections.abc.Mapping(x) 
17.
py_to_r(x) 
16.
py_maybe_convert(object, py_has_convert(x)) 
15.
py_get_attr_or_item(x, name, TRUE) 
14.
`$.python.builtin.object`(private$.anndata, uns) 
13.
private$.anndata$uns 
12.
py_to_r_ifneedbe(private$.anndata$uns) 
11.
(function (value) 
{
if (missing(value)) {
py_to_r_ifneedbe(private$.anndata$uns) ...
10.
py_resolve_dots(list(...)) 
9.
py_builtins$list(adata$uns$keys()) 
8.
.convert_anndata_slot(adata, "uns", py_builtins$list(adata$uns$keys()), 
"metadata", select = uns)
7.
AnnData2SCE(adata, X_name = X_name, hdf5_backed = backed, verbose = verbose, 
...)
6.
(function (file, X_name = NULL, backed = FALSE, verbose = NULL, 
...) 
{
.ui_info("Using the {.field Python} reader") ...
5.
do.call(.basilisk.fun, .basilisk.args) 
4.
evalq(do.call(.basilisk.fun, .basilisk.args), envir = proc, enclos = proc) 
3.
evalq(do.call(.basilisk.fun, .basilisk.args), envir = proc, enclos = proc) 
2.
basiliskRun(env = env, fun = .H5ADreader, file = file, X_name = X_name, 
backed = use_hdf5, verbose = verbose, ...)
1.
zellkonverter::readH5AD(h5ad.file) 

I can't attach the object but here's the content


> rhdf5::h5ls(h5ad.file)
                                 group                                         name       otype   dclass           dim
0                                    /                                            X H5I_DATASET    FLOAT 990 x 1348582
1                                    /                                       layers   H5I_GROUP                       
2                                    /                                          obs   H5I_GROUP                       
3                                 /obs                                          FOV H5I_DATASET  INTEGER       1348582
4                                 /obs Median Cell counts Raw Data Technical_repeat H5I_DATASET    FLOAT       1348582
5                                 /obs                                     Mouse_ID   H5I_GROUP                       
6                        /obs/Mouse_ID                                   categories H5I_DATASET   STRING            15
7                        /obs/Mouse_ID                                        codes H5I_DATASET  INTEGER       1348582
8                                 /obs                                  Sample_type   H5I_GROUP                       
9                     /obs/Sample_type                                   categories H5I_DATASET   STRING             4
10                    /obs/Sample_type                                        codes H5I_DATASET  INTEGER       1348582
11                                /obs                                     Slice_ID   H5I_GROUP                       
12                       /obs/Slice_ID                                   categories H5I_DATASET   STRING            52
13                       /obs/Slice_ID                                        codes H5I_DATASET  INTEGER       1348582
14                                /obs                      Technical_repeat_number   H5I_GROUP                       
15        /obs/Technical_repeat_number                                   categories H5I_DATASET   STRING            20
16        /obs/Technical_repeat_number                                        codes H5I_DATASET  INTEGER       1348582
17                                /obs                                        Tier1   H5I_GROUP                       
18                          /obs/Tier1                                   categories H5I_DATASET   STRING             8
19                          /obs/Tier1                                        codes H5I_DATASET  INTEGER       1348582
20                                /obs                                        Tier2   H5I_GROUP                       
21                          /obs/Tier2                                   categories H5I_DATASET   STRING            64
22                          /obs/Tier2                                        codes H5I_DATASET  INTEGER       1348582
23                                /obs                           Tier2_and_3_merged   H5I_GROUP                       
24             /obs/Tier2_and_3_merged                                   categories H5I_DATASET   STRING            72
25             /obs/Tier2_and_3_merged                                        codes H5I_DATASET  INTEGER       1348582
26                                /obs                                       _index H5I_DATASET   STRING       1348582
27                                /obs                                     cell_IDs H5I_DATASET   STRING       1348582
28                                /obs                                   log_counts H5I_DATASET    FLOAT       1348582
29                                /obs                                     n_counts H5I_DATASET    FLOAT       1348582
30                                /obs                                      n_genes H5I_DATASET  INTEGER       1348582
31                                /obs                                  norm_factor H5I_DATASET    FLOAT       1348582
32                                /obs                                       sample   H5I_GROUP                       
33                         /obs/sample                                   categories H5I_DATASET   STRING            20
34                         /obs/sample                                        codes H5I_DATASET  INTEGER       1348582
35                                /obs                                            x H5I_DATASET    FLOAT       1348582
36                                /obs                                            y H5I_DATASET    FLOAT       1348582
37                                   /                                         obsm   H5I_GROUP                       
38                               /obsm                                        X_pca H5I_DATASET    FLOAT 940 x 1348582
39                               /obsm                                       X_umap H5I_DATASET    FLOAT   2 x 1348582
40                                   /                                         obsp   H5I_GROUP                       
41                               /obsp                               connectivities   H5I_GROUP                       
42                /obsp/connectivities                                         data H5I_DATASET    FLOAT      37326600
43                /obsp/connectivities                                      indices H5I_DATASET  INTEGER      37326600
44                /obsp/connectivities                                       indptr H5I_DATASET  INTEGER       1348583
45                               /obsp                                    distances   H5I_GROUP                       
46                     /obsp/distances                                         data H5I_DATASET    FLOAT      18734083
47                     /obsp/distances                                      indices H5I_DATASET  INTEGER      18734083
48                     /obsp/distances                                       indptr H5I_DATASET  INTEGER       1348583
49                                   /                                          raw   H5I_GROUP                       
50                                /raw                                            X H5I_DATASET    FLOAT 990 x 1348582
51                                /raw                                          var   H5I_GROUP                       
52                            /raw/var                                       _index H5I_DATASET   STRING           990
53                                /raw                                         varm   H5I_GROUP                       
54                                   /                                          uns   H5I_GROUP                       
55                                /uns                                     DAPI_dir   H5I_GROUP                       
56                       /uns/DAPI_dir                               062221_D9_m3_2 H5I_DATASET   STRING         ( 0 )
57                       /uns/DAPI_dir                              062921_D0_m3a_1 H5I_DATASET   STRING         ( 0 )
58                       /uns/DAPI_dir                              062921_D0_m3a_2 H5I_DATASET   STRING         ( 0 )
59                       /uns/DAPI_dir                              062921_D9_m2a_1 H5I_DATASET   STRING         ( 0 )
60                       /uns/DAPI_dir                              062921_D9_m2a_2 H5I_DATASET   STRING         ( 0 )
61                       /uns/DAPI_dir                               062921_D9_m5_1 H5I_DATASET   STRING         ( 0 )
62                       /uns/DAPI_dir                               062921_D9_m5_2 H5I_DATASET   STRING         ( 0 )
63                       /uns/DAPI_dir                               082421_D0_m6_1 H5I_DATASET   STRING         ( 0 )
64                       /uns/DAPI_dir                               082421_D0_m7_1 H5I_DATASET   STRING         ( 0 )
65                       /uns/DAPI_dir                              082421_D21_m1_1 H5I_DATASET   STRING         ( 0 )
66                       /uns/DAPI_dir                              082421_D21_m2_1 H5I_DATASET   STRING         ( 0 )
67                       /uns/DAPI_dir                               092421_D3_m1_1 H5I_DATASET   STRING         ( 0 )
68                       /uns/DAPI_dir                               092421_D3_m2_1 H5I_DATASET   STRING         ( 0 )
69                       /uns/DAPI_dir                               092421_D3_m3_1 H5I_DATASET   STRING         ( 0 )
70                       /uns/DAPI_dir                               092421_D3_m4_1 H5I_DATASET   STRING         ( 0 )
71                       /uns/DAPI_dir                               100221_D9_m2_1 H5I_DATASET   STRING         ( 0 )
72                       /uns/DAPI_dir                               100221_D9_m3_1 H5I_DATASET   STRING         ( 0 )
73                       /uns/DAPI_dir                               100221_D9_m3_2 H5I_DATASET   STRING         ( 0 )
74                       /uns/DAPI_dir                               100221_D9_m5_1 H5I_DATASET   STRING         ( 0 )
75                       /uns/DAPI_dir                               100221_D9_m5_2 H5I_DATASET   STRING         ( 0 )
76                                /uns                           Sample_type_colors H5I_DATASET   STRING             4
77                                /uns                                 Tier1_colors H5I_DATASET   STRING             8
78                                /uns                    Tier2_and_3_merged_colors H5I_DATASET   STRING            72
79                                /uns                                 Tier2_colors H5I_DATASET   STRING            64
80                                /uns                                          hvg   H5I_GROUP                       
81                            /uns/hvg                                       flavor H5I_DATASET   STRING         ( 0 )
82                                /uns                                       leiden   H5I_GROUP                       
83                         /uns/leiden                                       params   H5I_GROUP                       
84                  /uns/leiden/params                                 n_iterations H5I_DATASET  INTEGER         ( 0 )
85                  /uns/leiden/params                                 random_state H5I_DATASET  INTEGER         ( 0 )
86                  /uns/leiden/params                                   resolution H5I_DATASET    FLOAT         ( 0 )
87                                /uns                                        log1p   H5I_GROUP                       
88                          /uns/log1p                                         base H5I_DATASET  INTEGER         ( 0 )
89                                /uns                                    neighbors   H5I_GROUP                       
90                      /uns/neighbors                           connectivities_key H5I_DATASET   STRING         ( 0 )
91                      /uns/neighbors                                distances_key H5I_DATASET   STRING         ( 0 )
92                      /uns/neighbors                                       params   H5I_GROUP                       
93               /uns/neighbors/params                                       method H5I_DATASET   STRING         ( 0 )
94               /uns/neighbors/params                                       metric H5I_DATASET   STRING         ( 0 )
95               /uns/neighbors/params                                  n_neighbors H5I_DATASET  INTEGER         ( 0 )
96               /uns/neighbors/params                                 random_state H5I_DATASET  INTEGER         ( 0 )
97                                /uns                                          pca   H5I_GROUP                       
98                            /uns/pca                                       params   H5I_GROUP                       
99                     /uns/pca/params                          use_highly_variable H5I_DATASET     ENUM         ( 0 )
100                    /uns/pca/params                                  zero_center H5I_DATASET     ENUM         ( 0 )
101                           /uns/pca                                     variance H5I_DATASET    FLOAT           940
102                           /uns/pca                               variance_ratio H5I_DATASET    FLOAT           940
103                               /uns                       rank_genes_leiden_r3.3   H5I_GROUP                       
104        /uns/rank_genes_leiden_r3.3                               logfoldchanges H5I_DATASET COMPOUND           990
105        /uns/rank_genes_leiden_r3.3                                        names H5I_DATASET COMPOUND           990
106        /uns/rank_genes_leiden_r3.3                                       params   H5I_GROUP                       
107 /uns/rank_genes_leiden_r3.3/params                                  corr_method H5I_DATASET   STRING         ( 0 )
108 /uns/rank_genes_leiden_r3.3/params                                      groupby H5I_DATASET   STRING         ( 0 )
109 /uns/rank_genes_leiden_r3.3/params                                       method H5I_DATASET   STRING         ( 0 )
110 /uns/rank_genes_leiden_r3.3/params                                    reference H5I_DATASET   STRING         ( 0 )
111 /uns/rank_genes_leiden_r3.3/params                                      use_raw H5I_DATASET     ENUM         ( 0 )
112        /uns/rank_genes_leiden_r3.3                                        pvals H5I_DATASET COMPOUND           990
113        /uns/rank_genes_leiden_r3.3                                    pvals_adj H5I_DATASET COMPOUND           990
114        /uns/rank_genes_leiden_r3.3                                       scores H5I_DATASET COMPOUND           990
115                               /uns                                         umap   H5I_GROUP                       
116                          /uns/umap                                       params   H5I_GROUP                       
117                   /uns/umap/params                                            a H5I_DATASET    FLOAT         ( 0 )
118                   /uns/umap/params                                            b H5I_DATASET    FLOAT         ( 0 )
119                                  /                                          var   H5I_GROUP                       
120                               /var                                       _index H5I_DATASET   STRING           990
121                               /var                                  dispersions H5I_DATASET    FLOAT           990
122                               /var                             dispersions_norm H5I_DATASET    FLOAT           990
123                               /var                              highly_variable H5I_DATASET     ENUM           990
124                               /var                                        means H5I_DATASET    FLOAT           990
125                               /var                                      n_cells H5I_DATASET  INTEGER           990
126                                  /                                         varm   H5I_GROUP                       
127                              /varm                                          PCs H5I_DATASET    FLOAT     940 x 990
128                                  /                                         varp   H5I_GROUP   
lazappi commented 2 years ago

Ok, so it seems like something to do with adata.uns is the issue. Possibly we can add a check for this but we need know exactly what the problem is first. Can you please try excluding things using the uns argument? Start by setting uns = FALSE and then if that works try uns = c("item1", "item2", ...) adding/removing items until you find what the culprit is.

rcannood commented 1 year ago

I encountered it once as well with the uns/rank_genes_groups/scores in:

cache <- BiocFileCache::BiocFileCache(ask = FALSE)
example_file <- BiocFileCache::bfcrpath(
    cache, "https://ndownloader.figshare.com/files/30462915"
)

sce <- readH5AD(example_file, raw = TRUE)

One approach we could try to use is to add an explicit converter along the lines of:

# workaround for Error in py_ref_to_r(x) :
# Conversion from numpy array type 20 is not supported
# see https://github.com/theislab/zellkonverter/issues/45
py_to_r.numpy.ndarray <- function(x) {
    disable_conversion_scope(x)

    if (x$dtype$num == 20) {
        np <- import("numpy", convert = TRUE)
        out <- 
            tryCatch({
                # assuming is float
                x$dtype <- np$dtype("float32")
                py_to_r(x)
            }, error = function(e) {
                warning("Could not convert numpy array type 20, skipping conversion")
                NULL
            })
        return(out)
    }

    # no special handler found; delegate to next method
    NextMethod()
}

(This solution is 100% untested ;))