Closed amoyguang1 closed 1 year ago
Hi @amoyguang1, thanks for giving {zellkonverter} a go. I'm not sure what exactly is happening here. Could you try running traceback()
to see where exactly the error is coming from?
If it's possible it would be really helpful if you could share the file that is causing this error (even a stripped back version without most of the information would be great as long as it causes this problem). If not I might have some more suggestions for things to look at but it would be more messing around on your side.
Hi Iazappi. Thanks a lot for the quick reply. Here is the traceback().
sce <- readH5AD('pbmc.h5ad') Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'colnames': INTEGER() can only be applied to a 'integer', not a 'double' traceback() 11: h(simpleError(msg, call)) 10: .handleSimpleError(function (cond) .Internal(C_tryCatchHelper(addr, 1L, cond)), "INTEGER() can only be applied to a 'integer', not a 'double'", base::quote(py_convert_pandas_df(x))) 9: py_convert_pandas_df(x) 8: py_to_r.pandas.core.frame.DataFrame(adata$obs) 7: py_to_r(adata$obs) 6: colnames(adata_df) 5: .convert_anndata_df(py_to_r(adata$obs), slot_name = "obs", to_name = "colData", select = obs) 4: AnnData2SCE(adata, X_name = X_name, hdf5_backed = backed, verbose = verbose, ...) 3: fun(...) 2: basiliskRun(env = env, fun = .H5ADreader, file = file, X_name = X_name, backed = use_hdf5, verbose = verbose, ...) 1: readH5AD("pbmc.h5ad")
I think it may be due to my data has TCR and BCR data as well. Here is the anndata info. AnnData object with n_obs × n_vars = 80816 × 2191 obs: 'multi_chain', 'high_confidence', 'is_cell', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'has_ir', 'sample', 'group', 'patient', 'disease', 'batch', 'n_genes', 'scrublet_score', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'gmm_pct_count_clusters_keep', 'is_doublet', 'filter_rna', 'leiden' var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std' uns: 'batch_colors', 'has_ir_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'sample_colors', 'umap' obsm: 'X_pca', 'X_pca_harmony', 'X_umap' varm: 'PCs' layers: 'counts' obsp: 'connectivities', 'distances'
I think the IR data may cause the problem. Is it possible to drop those obs in the conversion process? Do you have email address that i can share the data? thnk you.
Hmmm...I'm guessing maybe the wrong type is getting detected somehow? This might actually be a {reticulate} issue but we can look into it a bit more first.
The obs
argument of readH5AD()
/AnnData2SCE()
lets you select what is converted, TRUE
converts everything, FALSE
converts nothing and a vector of column names will convert only those columns. This is definitely worth a try but depending on the order things happen it might not avoid this error.
Given it seems to be an issue with obs
we should be able to reproduce it will just a subset of cells (maybe 100 or so) and without any of var
, obsm
, layers
etc. If you make a small file with just that you might be able to paste it directly here or share a link to Google Drive/OneDrive etc.
Have you checked that this file gives the same error? I was able to read it without any errors.
You are right. The trial file works on mine as well, but not actual file. Not sure why.
Maybe file too big to load in memory?
Possibly...? Or by subsetting you have removed whatever is causing the issue. Could you please play around a bit and see if you can get a small file that reproduces the issue?
It might also be helpful to know what software versions you are using (output of sessionInfo()
), I forgot to ask for that earlier.
I tried removing different obs. The problem remains when i try to open the dataset, which is about 2gb. If you could share you r email, i can share it with you privately. thank you very much.
sessionInfo() R version 4.2.3 (2023-03-15 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] zellkonverter_1.8.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.10 compiler_4.2.3 GenomeInfoDb_1.34.9
[4] XVector_0.38.0 basilisk.utils_1.10.0 MatrixGenerics_1.10.0
[7] bitops_1.0-7 tools_4.2.3 zlibbioc_1.44.0
[10] SingleCellExperiment_1.20.1 digest_0.6.31 jsonlite_1.8.4
[13] evaluate_0.20 lattice_0.20-45 png_0.1-8
[16] rlang_1.1.0 Matrix_1.5-3 dir.expiry_1.6.0
[19] DelayedArray_0.23.2 cli_3.6.1 rstudioapi_0.14
[22] filelock_1.0.2 parallel_4.2.3 yaml_2.3.7
[25] xfun_0.38 fastmap_1.1.1 GenomeInfoDbData_1.2.9
[28] withr_2.5.0 knitr_1.42 S4Vectors_0.36.2
[31] IRanges_2.32.0 rprojroot_2.0.3 stats4_4.2.3
[34] grid_4.2.3 here_1.0.1 reticulate_1.28
[37] Biobase_2.58.0 basilisk_1.10.2 rmarkdown_2.21
[40] matrixStats_0.63.0 htmltools_0.5.5 BiocGenerics_0.44.0
[43] GenomicRanges_1.50.2 SummarizedExperiment_1.28.0 RCurl_1.98-1.12
Please send it to luke@lazappi.id.au. Thanks.
Thanks for providing the file by email. I was able to read it without any problems using the release and devel versions. I wonder if something about your system is the issue? It's a bit hard to look into this more until we can get a reproducible example.
Thank you, Luke. Don't worry about it. Maybe it is just me. I am not sure why. I use windows 11, with AMD CPU.
I'm hoping this issue has been resolved but please comment if needed.
Hello,
I got the same issue with an H5AD file from CZI's Cellxgene, and I've attached a screenshot of the error. I can provide the file as well.
Hi @mingl1997
Can you please send a link to the file you had issues with? It would also be good to have the output from sessionInfo()
.
Sure, here is the session info:
I don't remember which link it is in particular because when a file is downloaded from CZI, it is simply called local.h5ad - I can email it instead.
It would be great if you could work out the file source, that way we can look at adding it to our automated tests (especially as I think this is a windows issue).
The other thing you can see from the warnings above is that {reticulate} is using the base conda environment rather than the environment created by {zellkonverter}. That may be the issue so I would try to fix that first.
Hello there, i am very interested to use the tool. Not sure why i can't convert anndata to sce. Seems that colnames type needs to be change. Thank you.