Closed const-ae closed 1 year ago
Hi @const-ae
Thanks for the issue (and giving zellkonverter a go)! It definitely could be whatever is in varm
, I should be able to download the dataset from the link and have a look. If you want to make a smaller example that would be great. Easiest way is probably to load it in Python and subset to a smaller number of genes/cells and save a new .h5ad
. Don't worry about it if that sounds too much effort though.
Ok, I have spent some time looking into this. First I can confirm this is an issue (using the smaller B_Plasma.h5ad
file from the same link). The culprit is indeed the COMPOUND
data stored in adata.varm["de_res"]
. When you read this from disk in Python you get a 1-dimensional ndarray
of tuples which seems really weird but is the result of using pandas.DataFrame.to_records
to create a numpy.recarray
(with some information information lost during to trip to disk).
For zellkonverter there are a couple of things that need to be done:
varm
conversion so that at least there isn't an error if something fails to convertrecarray
objects and converting them to "normal" numpy arrays.Should be able to get at least the conversion check into the next release, not so sure about the recarray
conversion.
This is great. Thank you so much for your effort :)
Ok, I've added checks that should prevent this error in the current dev version (v1.1.8) but without any conversion. I would like to be able to handle the conversion as well but I think that will require rewriting some things so might not happen for this release.
This is already great. Thank you for the effort :)
I'm having a similar issue
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets
[7] methods base
other attached packages:
[1] reticulate_1.25 SpatialExperiment_1.6.1
[3] zellkonverter_1.6.4 SingleCellExperiment_1.18.0
[5] SummarizedExperiment_1.26.1 Biobase_2.56.0
[7] GenomicRanges_1.48.0 GenomeInfoDb_1.32.3
[9] IRanges_2.30.0 S4Vectors_0.34.0
[11] BiocGenerics_0.42.0 MatrixGenerics_1.8.1
[13] matrixStats_0.62.0 shiny_1.7.2
Error is similar to OP:
> sce <- zellkonverter::readH5AD(h5ad.file)
Error in py_ref_to_r(x) :
Conversion from numpy array type 20 is not supported
I did try rolling back to Zellkonverter 1.6.0 and the error is not present. Is this perhaps an anndata version issue?
There shouldn't be any major differences between {zellkonverter} v1.6.0 and v1.6.4. Could you please post the output with verbose = TRUE
and if possible share the file you are using?
verbose = TRUE
output:
> sce <- zellkonverter::readH5AD(h5ad.file,verbose = TRUE)
ℹ Using anndata version 0.8.0
ℹ Using the Python reader
✔ Read .../.../.../.../.../adata_Tier1_2_3_combined.h5ad [32.1s]
Error in py_ref_to_r(x) : ngleCellExperiment
Conversion from numpy array type 20 is not supported
✖ Converting AnnData to SingleCellExperiment ... failed
Traceback on error:
Error in py_ref_to_r(x) :
Conversion from numpy array type 20 is not supported
23.
stop(structure(list(message = "Conversion from numpy array type 20 is not supported",
call = py_ref_to_r(x), cppstack = NULL), class = c("Rcpp::exception",
"C++Error", "error", "condition")))
22.
py_ref_to_r(x)
21.
py_to_r.default(result)
20.
py_to_r(result)
19.
python_builtins$dict(x)
18.
py_to_r.collections.abc.Mapping(x)
17.
py_to_r(x)
16.
py_maybe_convert(object, py_has_convert(x))
15.
py_get_attr_or_item(x, name, TRUE)
14.
`$.python.builtin.object`(private$.anndata, uns)
13.
private$.anndata$uns
12.
py_to_r_ifneedbe(private$.anndata$uns)
11.
(function (value)
{
if (missing(value)) {
py_to_r_ifneedbe(private$.anndata$uns) ...
10.
py_resolve_dots(list(...))
9.
py_builtins$list(adata$uns$keys())
8.
.convert_anndata_slot(adata, "uns", py_builtins$list(adata$uns$keys()),
"metadata", select = uns)
7.
AnnData2SCE(adata, X_name = X_name, hdf5_backed = backed, verbose = verbose,
...)
6.
(function (file, X_name = NULL, backed = FALSE, verbose = NULL,
...)
{
.ui_info("Using the {.field Python} reader") ...
5.
do.call(.basilisk.fun, .basilisk.args)
4.
evalq(do.call(.basilisk.fun, .basilisk.args), envir = proc, enclos = proc)
3.
evalq(do.call(.basilisk.fun, .basilisk.args), envir = proc, enclos = proc)
2.
basiliskRun(env = env, fun = .H5ADreader, file = file, X_name = X_name,
backed = use_hdf5, verbose = verbose, ...)
1.
zellkonverter::readH5AD(h5ad.file)
I can't attach the object but here's the content
> rhdf5::h5ls(h5ad.file)
group name otype dclass dim
0 / X H5I_DATASET FLOAT 990 x 1348582
1 / layers H5I_GROUP
2 / obs H5I_GROUP
3 /obs FOV H5I_DATASET INTEGER 1348582
4 /obs Median Cell counts Raw Data Technical_repeat H5I_DATASET FLOAT 1348582
5 /obs Mouse_ID H5I_GROUP
6 /obs/Mouse_ID categories H5I_DATASET STRING 15
7 /obs/Mouse_ID codes H5I_DATASET INTEGER 1348582
8 /obs Sample_type H5I_GROUP
9 /obs/Sample_type categories H5I_DATASET STRING 4
10 /obs/Sample_type codes H5I_DATASET INTEGER 1348582
11 /obs Slice_ID H5I_GROUP
12 /obs/Slice_ID categories H5I_DATASET STRING 52
13 /obs/Slice_ID codes H5I_DATASET INTEGER 1348582
14 /obs Technical_repeat_number H5I_GROUP
15 /obs/Technical_repeat_number categories H5I_DATASET STRING 20
16 /obs/Technical_repeat_number codes H5I_DATASET INTEGER 1348582
17 /obs Tier1 H5I_GROUP
18 /obs/Tier1 categories H5I_DATASET STRING 8
19 /obs/Tier1 codes H5I_DATASET INTEGER 1348582
20 /obs Tier2 H5I_GROUP
21 /obs/Tier2 categories H5I_DATASET STRING 64
22 /obs/Tier2 codes H5I_DATASET INTEGER 1348582
23 /obs Tier2_and_3_merged H5I_GROUP
24 /obs/Tier2_and_3_merged categories H5I_DATASET STRING 72
25 /obs/Tier2_and_3_merged codes H5I_DATASET INTEGER 1348582
26 /obs _index H5I_DATASET STRING 1348582
27 /obs cell_IDs H5I_DATASET STRING 1348582
28 /obs log_counts H5I_DATASET FLOAT 1348582
29 /obs n_counts H5I_DATASET FLOAT 1348582
30 /obs n_genes H5I_DATASET INTEGER 1348582
31 /obs norm_factor H5I_DATASET FLOAT 1348582
32 /obs sample H5I_GROUP
33 /obs/sample categories H5I_DATASET STRING 20
34 /obs/sample codes H5I_DATASET INTEGER 1348582
35 /obs x H5I_DATASET FLOAT 1348582
36 /obs y H5I_DATASET FLOAT 1348582
37 / obsm H5I_GROUP
38 /obsm X_pca H5I_DATASET FLOAT 940 x 1348582
39 /obsm X_umap H5I_DATASET FLOAT 2 x 1348582
40 / obsp H5I_GROUP
41 /obsp connectivities H5I_GROUP
42 /obsp/connectivities data H5I_DATASET FLOAT 37326600
43 /obsp/connectivities indices H5I_DATASET INTEGER 37326600
44 /obsp/connectivities indptr H5I_DATASET INTEGER 1348583
45 /obsp distances H5I_GROUP
46 /obsp/distances data H5I_DATASET FLOAT 18734083
47 /obsp/distances indices H5I_DATASET INTEGER 18734083
48 /obsp/distances indptr H5I_DATASET INTEGER 1348583
49 / raw H5I_GROUP
50 /raw X H5I_DATASET FLOAT 990 x 1348582
51 /raw var H5I_GROUP
52 /raw/var _index H5I_DATASET STRING 990
53 /raw varm H5I_GROUP
54 / uns H5I_GROUP
55 /uns DAPI_dir H5I_GROUP
56 /uns/DAPI_dir 062221_D9_m3_2 H5I_DATASET STRING ( 0 )
57 /uns/DAPI_dir 062921_D0_m3a_1 H5I_DATASET STRING ( 0 )
58 /uns/DAPI_dir 062921_D0_m3a_2 H5I_DATASET STRING ( 0 )
59 /uns/DAPI_dir 062921_D9_m2a_1 H5I_DATASET STRING ( 0 )
60 /uns/DAPI_dir 062921_D9_m2a_2 H5I_DATASET STRING ( 0 )
61 /uns/DAPI_dir 062921_D9_m5_1 H5I_DATASET STRING ( 0 )
62 /uns/DAPI_dir 062921_D9_m5_2 H5I_DATASET STRING ( 0 )
63 /uns/DAPI_dir 082421_D0_m6_1 H5I_DATASET STRING ( 0 )
64 /uns/DAPI_dir 082421_D0_m7_1 H5I_DATASET STRING ( 0 )
65 /uns/DAPI_dir 082421_D21_m1_1 H5I_DATASET STRING ( 0 )
66 /uns/DAPI_dir 082421_D21_m2_1 H5I_DATASET STRING ( 0 )
67 /uns/DAPI_dir 092421_D3_m1_1 H5I_DATASET STRING ( 0 )
68 /uns/DAPI_dir 092421_D3_m2_1 H5I_DATASET STRING ( 0 )
69 /uns/DAPI_dir 092421_D3_m3_1 H5I_DATASET STRING ( 0 )
70 /uns/DAPI_dir 092421_D3_m4_1 H5I_DATASET STRING ( 0 )
71 /uns/DAPI_dir 100221_D9_m2_1 H5I_DATASET STRING ( 0 )
72 /uns/DAPI_dir 100221_D9_m3_1 H5I_DATASET STRING ( 0 )
73 /uns/DAPI_dir 100221_D9_m3_2 H5I_DATASET STRING ( 0 )
74 /uns/DAPI_dir 100221_D9_m5_1 H5I_DATASET STRING ( 0 )
75 /uns/DAPI_dir 100221_D9_m5_2 H5I_DATASET STRING ( 0 )
76 /uns Sample_type_colors H5I_DATASET STRING 4
77 /uns Tier1_colors H5I_DATASET STRING 8
78 /uns Tier2_and_3_merged_colors H5I_DATASET STRING 72
79 /uns Tier2_colors H5I_DATASET STRING 64
80 /uns hvg H5I_GROUP
81 /uns/hvg flavor H5I_DATASET STRING ( 0 )
82 /uns leiden H5I_GROUP
83 /uns/leiden params H5I_GROUP
84 /uns/leiden/params n_iterations H5I_DATASET INTEGER ( 0 )
85 /uns/leiden/params random_state H5I_DATASET INTEGER ( 0 )
86 /uns/leiden/params resolution H5I_DATASET FLOAT ( 0 )
87 /uns log1p H5I_GROUP
88 /uns/log1p base H5I_DATASET INTEGER ( 0 )
89 /uns neighbors H5I_GROUP
90 /uns/neighbors connectivities_key H5I_DATASET STRING ( 0 )
91 /uns/neighbors distances_key H5I_DATASET STRING ( 0 )
92 /uns/neighbors params H5I_GROUP
93 /uns/neighbors/params method H5I_DATASET STRING ( 0 )
94 /uns/neighbors/params metric H5I_DATASET STRING ( 0 )
95 /uns/neighbors/params n_neighbors H5I_DATASET INTEGER ( 0 )
96 /uns/neighbors/params random_state H5I_DATASET INTEGER ( 0 )
97 /uns pca H5I_GROUP
98 /uns/pca params H5I_GROUP
99 /uns/pca/params use_highly_variable H5I_DATASET ENUM ( 0 )
100 /uns/pca/params zero_center H5I_DATASET ENUM ( 0 )
101 /uns/pca variance H5I_DATASET FLOAT 940
102 /uns/pca variance_ratio H5I_DATASET FLOAT 940
103 /uns rank_genes_leiden_r3.3 H5I_GROUP
104 /uns/rank_genes_leiden_r3.3 logfoldchanges H5I_DATASET COMPOUND 990
105 /uns/rank_genes_leiden_r3.3 names H5I_DATASET COMPOUND 990
106 /uns/rank_genes_leiden_r3.3 params H5I_GROUP
107 /uns/rank_genes_leiden_r3.3/params corr_method H5I_DATASET STRING ( 0 )
108 /uns/rank_genes_leiden_r3.3/params groupby H5I_DATASET STRING ( 0 )
109 /uns/rank_genes_leiden_r3.3/params method H5I_DATASET STRING ( 0 )
110 /uns/rank_genes_leiden_r3.3/params reference H5I_DATASET STRING ( 0 )
111 /uns/rank_genes_leiden_r3.3/params use_raw H5I_DATASET ENUM ( 0 )
112 /uns/rank_genes_leiden_r3.3 pvals H5I_DATASET COMPOUND 990
113 /uns/rank_genes_leiden_r3.3 pvals_adj H5I_DATASET COMPOUND 990
114 /uns/rank_genes_leiden_r3.3 scores H5I_DATASET COMPOUND 990
115 /uns umap H5I_GROUP
116 /uns/umap params H5I_GROUP
117 /uns/umap/params a H5I_DATASET FLOAT ( 0 )
118 /uns/umap/params b H5I_DATASET FLOAT ( 0 )
119 / var H5I_GROUP
120 /var _index H5I_DATASET STRING 990
121 /var dispersions H5I_DATASET FLOAT 990
122 /var dispersions_norm H5I_DATASET FLOAT 990
123 /var highly_variable H5I_DATASET ENUM 990
124 /var means H5I_DATASET FLOAT 990
125 /var n_cells H5I_DATASET INTEGER 990
126 / varm H5I_GROUP
127 /varm PCs H5I_DATASET FLOAT 940 x 990
128 / varp H5I_GROUP
Ok, so it seems like something to do with adata.uns
is the issue. Possibly we can add a check for this but we need know exactly what the problem is first. Can you please try excluding things using the uns
argument? Start by setting uns = FALSE
and then if that works try uns = c("item1", "item2", ...)
adding/removing items until you find what the culprit is.
I encountered it once as well with the uns/rank_genes_groups/scores
in:
cache <- BiocFileCache::BiocFileCache(ask = FALSE)
example_file <- BiocFileCache::bfcrpath(
cache, "https://ndownloader.figshare.com/files/30462915"
)
sce <- readH5AD(example_file, raw = TRUE)
One approach we could try to use is to add an explicit converter along the lines of:
# workaround for Error in py_ref_to_r(x) :
# Conversion from numpy array type 20 is not supported
# see https://github.com/theislab/zellkonverter/issues/45
py_to_r.numpy.ndarray <- function(x) {
disable_conversion_scope(x)
if (x$dtype$num == 20) {
np <- import("numpy", convert = TRUE)
out <-
tryCatch({
# assuming is float
x$dtype <- np$dtype("float32")
py_to_r(x)
}, error = function(e) {
warning("Could not convert numpy array type 20, skipping conversion")
NULL
})
return(out)
}
# no special handler found; delegate to next method
NextMethod()
}
(This solution is 100% untested ;))
Hi Luke,
I tried to load a single cell dataset from https://singlecell.broadinstitute.org/single_cell/study/SCP1052/covid-19-lung-autopsy-samples#study-summary, however I get the following error
in line https://github.com/theislab/zellkonverter/blob/069239ee6ae73d2b2f205681c15056e33de3a982/R/konverter.R#L160
I assume that the problem is the COMPOUND datatype in
/varm
, however I am not quite sure.I could try to provide a reduced file as the original is quite big (1,5GB), but I am not sure what is the best way to reduce the size of a h5ad file.
Best, Constantin