theislab / zellkonverter

Conversion between scRNA-seq objects
https://theislab.github.io/zellkonverter/
Other
145 stars 27 forks source link

help with specific h5ad file #58

Closed johnscrn closed 2 years ago

johnscrn commented 2 years ago

I have an .h5ad file I can load into python and seems to work with no issues. I want to instead read it into R. Here is the code to reproduce my issue:

url <- "https://storage.googleapis.com/gtex_analysis_v9/snrna_seq_data/GTEx_8_tissues_snRNAseq_atlas_071421.public_obs.h5ad" 
curl::curl_download(url, basename(url)) 
library(zellkonverter) 
data <- readH5AD(file = "GTEx_8_tissues_snRNAseq_atlas_071421.public_obs.h5ad",
                 verbose=T, layers=F, varm=F, obsm=F, varp=F, obsp=F, uns=F)

Output:

i Using the Python reader
\ Reading ./GTEx_8_tissues_snRNAseq_atlas_071421.public_obs.h5ad
v Read ./GTEx_8_tissues_snRNAseq_atlas_071421.public_obs.h5ad [56.1s]

i Converting AnnData to SingleCellExperiment
i Skipping conversion of uns
i Converting X matrix to assay
v X matrix converted to assay [25.5s]

i Skipping conversion of layers
**Error in py_convert_pandas_df(x) : 
  INTEGER() can only be applied to a 'integer', not a 'double'
x Converting AnnData to SingleCellExperiment ... failed**

.... I'd like to keep the obs and var but tried removing them in case one of them was the issue. Can anyone point me in the right direction?

Thank you!

Session info

```r sessionInfo() R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] zellkonverter_1.4.0 loaded via a namespace (and not attached): [1] Rcpp_1.0.7 XVector_0.34.0 [3] GenomicRanges_1.46.1 BiocGenerics_0.40.0 [5] zlibbioc_1.40.0 IRanges_2.28.0 [7] here_1.0.1 lattice_0.20-45 [9] GenomeInfoDb_1.30.0 tools_4.1.2 [11] parallel_4.1.2 SummarizedExperiment_1.24.0 [13] grid_4.1.2 Biobase_2.54.0 [15] png_0.1-7 cli_3.1.0 [17] basilisk_1.6.0 matrixStats_0.61.0 [19] rprojroot_2.0.2 Matrix_1.3-4 [21] dir.expiry_1.2.0 GenomeInfoDbData_1.2.7 [23] BiocManager_1.30.16 S4Vectors_0.32.3 [25] bitops_1.0-7 basilisk.utils_1.6.0 [27] RCurl_1.98-1.5 SingleCellExperiment_1.16.0 [29] glue_1.5.1 DelayedArray_0.20.0 [31] compiler_4.1.2 filelock_1.0.2 [33] MatrixGenerics_1.6.0 stats4_4.1.2 [35] jsonlite_1.7.2 reticulate_1.22 ```

lazappi commented 2 years ago

Hi @johnscrn

This seems a bit weird, not sure exactly what was going on. I was able to read and convert the file fine (big thanks for providing the file by the way).

> url <- "https://storage.googleapis.com/gtex_analysis_v9/snrna_seq_data/GTEx_8_tissues_snRNAseq_atlas_071421.public_obs.h5ad" 
> temp <- tempfile(fileext = ".h5ad")
> curl::curl_download(url, temp)
> zellkonverter::readH5AD(temp, verbose=T, layers=F, varm=F, obsm=F, varp=F, obsp=F, uns=F)
ℹ Using the Python reader
✓ Read /.../.../rj/.../T/.../file497b6e4aab4c.h5ad [28.2s]
ℹ Skipping conversion of uns                 
✓ X matrix converted to assay [57.1s]       
ℹ Skipping conversion of layers              
ℹ Skipping conversion of varm                
ℹ Skipping conversion of obsm                
ℹ Skipping conversion of varp                
ℹ Skipping conversion of obsp                
✓ SingleCellExperiment constructed [3.5s]   
ℹ Skipping conversion of raw                 
✓ Converting AnnData to SingleCellExperiment ... done
class: SingleCellExperiment 
dim: 17695 209126 
metadata(0):
assays(1): X
rownames(17695): FO538757.2 SAMD11 ... S100B PRMT2
rowData names(18): gene_ids Chromosome ... gene_include n_cells
colnames(209126): CST01_TAGGCATGTAAATACG-skeletalmuscle
  CST01_CCTTACGTCCGTCAAA-skeletalmuscle ... TST03_CACAGGCGTACATCCA-skin
  TST03_GACCAATTCCAGTATG-skin
colData names(47): n_genes fpr ... Tissue channel
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

It seems like maybe {reticulate} is trying to convert something that it thinks is 'integer' but is actually 'double'. Not really sure why though. I wonder if maybe it is a platform thing? Would you be able to either a) confirm the same error on another Windows machine or b) try the same file on Linux/MacOS and see if that works?

johnscrn commented 2 years ago

Thank you for looking into this. I get the same error from my personal computer and would have tried linux but ran into R update issues (that I cannot deal with right now). I was able to convert the file after removing all the unnecessary annotation from anndata. If I get some time I'll add things back and see if I can't figure out which annotation was the issue.

since you cannot recreate you can close this issue. If anyone else comes here after downloading the new GTEx tissue atlas https://www.gtexportal.org/home/datasets ... just delete all but what you absolutely need from the anndata and resave to h5ad.

Thanks again.

lazappi commented 2 years ago

Thanks, I'm guessing it might be a Windows thing then. Just confirming that this is a public dataset? If so we can add it to our set of tests to a) confirm the issue and b) hopefully come up with a fix.

johnscrn commented 2 years ago

Yes it is public. The link I gave in the last comment will take you to their project page. Here is their data use statement: "All datasets from phs00424.v5.p1 forward will follow the NIH GDS policy. This means that once released through dbGaP, there are no restrictions on use or publication. This document and an accompanying table of dataset releases can be found at http://www.gtexportal.org/home/documentationPage ."

lazappi commented 2 years ago

This is now included in the test suite from the latest release and there don't seem to have been any issues. I'm going to close this but if you are still having issues with the latest {zellkonverter} version please reopen.