theislab / zellkonverter

Conversion between scRNA-seq objects
https://theislab.github.io/zellkonverter/
Other
144 stars 27 forks source link

'colnames': INTEGER() can only be applied to a 'integer', not a 'double' #91

Closed amoyguang1 closed 1 year ago

amoyguang1 commented 1 year ago

Hello there, i am very interested to use the tool. Not sure why i can't convert anndata to sce. Seems that colnames type needs to be change. Thank you.

sce <- readH5AD('pbmc.h5ad') Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'colnames': INTEGER() can only be applied to a 'integer', not a 'double' sce <- readH5AD('pbmc.h5ad', verbose=T, layers=F, varm=F, obsm=F, varp=F, obsp=F, uns=F) ℹ Using the Python reader ℹ Using anndata version 0.8.0 ✔ Read ./pbmc.h5ad [5.2s] ℹ Skipping conversion of uns
✔ X matrix converted to assay [8.7s]
ℹ Skipping conversion of layers
✔ var converted to rowData [104ms]
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'colnames': INTEGER() can only be applied to a 'integer', not a 'double' ✖ Converting obs to colData [229ms] ✖ Converting AnnData to SingleCellExperiment ... failed

lazappi commented 1 year ago

Hi @amoyguang1, thanks for giving {zellkonverter} a go. I'm not sure what exactly is happening here. Could you try running traceback() to see where exactly the error is coming from?

If it's possible it would be really helpful if you could share the file that is causing this error (even a stripped back version without most of the information would be great as long as it causes this problem). If not I might have some more suggestions for things to look at but it would be more messing around on your side.

amoyguang1 commented 1 year ago

Hi Iazappi. Thanks a lot for the quick reply. Here is the traceback().

sce <- readH5AD('pbmc.h5ad') Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'colnames': INTEGER() can only be applied to a 'integer', not a 'double' traceback() 11: h(simpleError(msg, call)) 10: .handleSimpleError(function (cond) .Internal(C_tryCatchHelper(addr, 1L, cond)), "INTEGER() can only be applied to a 'integer', not a 'double'", base::quote(py_convert_pandas_df(x))) 9: py_convert_pandas_df(x) 8: py_to_r.pandas.core.frame.DataFrame(adata$obs) 7: py_to_r(adata$obs) 6: colnames(adata_df) 5: .convert_anndata_df(py_to_r(adata$obs), slot_name = "obs", to_name = "colData", select = obs) 4: AnnData2SCE(adata, X_name = X_name, hdf5_backed = backed, verbose = verbose, ...) 3: fun(...) 2: basiliskRun(env = env, fun = .H5ADreader, file = file, X_name = X_name, backed = use_hdf5, verbose = verbose, ...) 1: readH5AD("pbmc.h5ad")

I think it may be due to my data has TCR and BCR data as well. Here is the anndata info. AnnData object with n_obs × n_vars = 80816 × 2191 obs: 'multi_chain', 'high_confidence', 'is_cell', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_duplicate_count', 'IR_VJ_2_duplicate_count', 'IR_VDJ_1_duplicate_count', 'IR_VDJ_2_duplicate_count', 'IR_VJ_1_j_call', 'IR_VJ_2_j_call', 'IR_VDJ_1_j_call', 'IR_VDJ_2_j_call', 'IR_VJ_1_junction', 'IR_VJ_2_junction', 'IR_VDJ_1_junction', 'IR_VDJ_2_junction', 'IR_VJ_1_junction_aa', 'IR_VJ_2_junction_aa', 'IR_VDJ_1_junction_aa', 'IR_VDJ_2_junction_aa', 'IR_VJ_1_locus', 'IR_VJ_2_locus', 'IR_VDJ_1_locus', 'IR_VDJ_2_locus', 'IR_VJ_1_productive', 'IR_VJ_2_productive', 'IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_v_call', 'IR_VJ_2_v_call', 'IR_VDJ_1_v_call', 'IR_VDJ_2_v_call', 'has_ir', 'sample', 'group', 'patient', 'disease', 'batch', 'n_genes', 'scrublet_score', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'gmm_pct_count_clusters_keep', 'is_doublet', 'filter_rna', 'leiden' var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std' uns: 'batch_colors', 'has_ir_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'sample_colors', 'umap' obsm: 'X_pca', 'X_pca_harmony', 'X_umap' varm: 'PCs' layers: 'counts' obsp: 'connectivities', 'distances'

amoyguang1 commented 1 year ago

I think the IR data may cause the problem. Is it possible to drop those obs in the conversion process? Do you have email address that i can share the data? thnk you.

lazappi commented 1 year ago

Hmmm...I'm guessing maybe the wrong type is getting detected somehow? This might actually be a {reticulate} issue but we can look into it a bit more first.

The obs argument of readH5AD()/AnnData2SCE() lets you select what is converted, TRUE converts everything, FALSE converts nothing and a vector of column names will convert only those columns. This is definitely worth a try but depending on the order things happen it might not avoid this error.

Given it seems to be an issue with obs we should be able to reproduce it will just a subset of cells (maybe 100 or so) and without any of var, obsm, layers etc. If you make a small file with just that you might be able to paste it directly here or share a link to Google Drive/OneDrive etc.

lazappi commented 1 year ago

Have you checked that this file gives the same error? I was able to read it without any errors.

amoyguang1 commented 1 year ago

You are right. The trial file works on mine as well, but not actual file. Not sure why.

amoyguang1 commented 1 year ago

Maybe file too big to load in memory? image

lazappi commented 1 year ago

Possibly...? Or by subsetting you have removed whatever is causing the issue. Could you please play around a bit and see if you can get a small file that reproduces the issue?

It might also be helpful to know what software versions you are using (output of sessionInfo()), I forgot to ask for that earlier.

amoyguang1 commented 1 year ago

I tried removing different obs. The problem remains when i try to open the dataset, which is about 2gb. If you could share you r email, i can share it with you privately. thank you very much.

sessionInfo() R version 4.2.3 (2023-03-15 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] zellkonverter_1.8.0

loaded via a namespace (and not attached): [1] Rcpp_1.0.10 compiler_4.2.3 GenomeInfoDb_1.34.9
[4] XVector_0.38.0 basilisk.utils_1.10.0 MatrixGenerics_1.10.0
[7] bitops_1.0-7 tools_4.2.3 zlibbioc_1.44.0
[10] SingleCellExperiment_1.20.1 digest_0.6.31 jsonlite_1.8.4
[13] evaluate_0.20 lattice_0.20-45 png_0.1-8
[16] rlang_1.1.0 Matrix_1.5-3 dir.expiry_1.6.0
[19] DelayedArray_0.23.2 cli_3.6.1 rstudioapi_0.14
[22] filelock_1.0.2 parallel_4.2.3 yaml_2.3.7
[25] xfun_0.38 fastmap_1.1.1 GenomeInfoDbData_1.2.9
[28] withr_2.5.0 knitr_1.42 S4Vectors_0.36.2
[31] IRanges_2.32.0 rprojroot_2.0.3 stats4_4.2.3
[34] grid_4.2.3 here_1.0.1 reticulate_1.28
[37] Biobase_2.58.0 basilisk_1.10.2 rmarkdown_2.21
[40] matrixStats_0.63.0 htmltools_0.5.5 BiocGenerics_0.44.0
[43] GenomicRanges_1.50.2 SummarizedExperiment_1.28.0 RCurl_1.98-1.12

lazappi commented 1 year ago

Please send it to luke@lazappi.id.au. Thanks.

lazappi commented 1 year ago

Thanks for providing the file by email. I was able to read it without any problems using the release and devel versions. I wonder if something about your system is the issue? It's a bit hard to look into this more until we can get a reproducible example.

amoyguang1 commented 1 year ago

Thank you, Luke. Don't worry about it. Maybe it is just me. I am not sure why. I use windows 11, with AMD CPU.

image

lazappi commented 1 year ago

I'm hoping this issue has been resolved but please comment if needed.

mingl1997 commented 1 year ago

Hello,

I got the same issue with an H5AD file from CZI's Cellxgene, and I've attached a screenshot of the error. I can provide the file as well. zellkonverter

lazappi commented 1 year ago

Hi @mingl1997

Can you please send a link to the file you had issues with? It would also be good to have the output from sessionInfo().

mingl1997 commented 1 year ago

Sure, here is the session info:

I don't remember which link it is in particular because when a file is downloaded from CZI, it is simply called local.h5ad - I can email it instead. zellkonverter2

lazappi commented 1 year ago

It would be great if you could work out the file source, that way we can look at adding it to our automated tests (especially as I think this is a windows issue).

The other thing you can see from the warnings above is that {reticulate} is using the base conda environment rather than the environment created by {zellkonverter}. That may be the issue so I would try to fix that first.