theislab / zellkonverter

Conversion between scRNA-seq objects
https://theislab.github.io/zellkonverter/
Other
144 stars 27 forks source link

‘X' matrix does not support transposition and has been skipped #114

Closed GabrielHoffman closed 5 months ago

GabrielHoffman commented 5 months ago

I'm still having this issue after trying multiple R reinstallation tries (source and conda) on my CentOS 7.9.2009 cluster. It's worked fine for months, but I must have updated something by mistake, and now I can't get it to work again.

> sce = readH5AD(file, use_hdf5=TRUE)

Warning message:
'X' matrix does not support transposition and has been skipped

This means I can't read the single cell counts into memory. This is a major step in my dreamlet package workflow.

> BiocManager::valid()
[1] TRUE

> sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /sc/arion/work/hoffmg01/condaEnv/R43/lib/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods  
[8] base     

other attached packages:
 [1] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
 [3] Biobase_2.62.0              GenomicRanges_1.54.1       
 [5] GenomeInfoDb_1.38.8         IRanges_2.36.0             
 [7] S4Vectors_0.40.2            BiocGenerics_0.48.1        
 [9] MatrixGenerics_1.14.0       matrixStats_1.2.0          
[11] zellkonverter_1.12.1       

loaded via a namespace (and not attached):
 [1] Matrix_1.6-5            jsonlite_1.8.8          BiocManager_1.30.22    
 [4] compiler_4.3.3          crayon_1.5.2            filelock_1.0.3         
 [7] Rcpp_1.0.12             rhdf5filters_1.14.1     bitops_1.0-7           
[10] parallel_4.3.3          png_0.1-8               reticulate_1.35.0      
[13] lattice_0.22-5          XVector_0.42.0          S4Arrays_1.2.1         
[16] DelayedArray_0.28.0     GenomeInfoDbData_1.2.11 rlang_1.1.3            
[19] HDF5Array_1.30.1        dir.expiry_1.10.0       SparseArray_1.2.4      
[22] cli_3.6.2               withr_3.0.0             Rhdf5lib_1.24.2        
[25] zlibbioc_1.48.2         grid_4.3.3              basilisk_1.14.3        
[28] rhdf5_2.46.1            abind_1.4-5             RCurl_1.98-1.14        
[31] basilisk.utils_1.14.1   tools_4.3.3     

Suggestions or fixes?

Best, Gabriel

GabrielHoffman commented 5 months ago

This solved the issue:

sce = readH5AD(file, use_hdf5=TRUE, verbose=TRUE, version='0.8.0')

Since there are multiple versions of AnnData and H5AD, can you add a compatibility check?

Gabriel

lazappi commented 5 months ago

Ok, that's interesting. Using a later anndata version should return the same object so I'm not sure what's happening there. Would it be possible to share a small test file where this happens? That would be really helpful for checking things.

What kinds of checks are you suggesting? I'm not quite sure I understand.

GabrielHoffman commented 5 months ago

Here is the file I get the error on: https://www.synapse.org/#!Synapse:syn51188644

As far as a check, I was thinking that you can detect the AnnData version the h5ad was written with, and then tell the user if there is a compatibility issue with the read and write versions. Or maybe that's not the issue

Gabriel

lazappi commented 5 months ago

Would it be possible to create a smaller example file? That file is too big to test on my laptop and I can't figure out all the authentication etc. required to download it to a server.

GabrielHoffman commented 5 months ago

Here is a 70 mb subset: https://ghoffman-cdn.s3.us-east-2.amazonaws.com/dreamlet_analysis/data/PsychAD_r0_Dec_28_2022_subset.h5ad

I extracted the subset and did some testing:

library(zellkonverter)

# original file
file = "PsychAD_r0_Dec_28_2022.h5ad"

# read failure failure
sce = readH5AD(file, use_hdf5=TRUE)
# Warning message:
# 'X' matrix does not support transposition and has been skipped

# success
sce = readH5AD(file, use_hdf5=TRUE, version="0.8.0")

# write subset
# changing compression doesn't affect results
writeH5AD(sce[seq(1000), seq(20000)], file="PsychAD_r0_Dec_28_2022_subset.h5ad", compression="lzf")

# Read failure
sce2 = readH5AD("PsychAD_r0_Dec_28_2022_subset.h5ad", use_hdf5=TRUE)
# Warning message:
# 'X' matrix does not support transposition and has been skipped

# No data read in
sum(assay(sce2, 1))
# gives 0

# success
sce2 = readH5AD("PsychAD_r0_Dec_28_2022_subset.h5ad", use_hdf5=TRUE, version="0.8.0")
lazappi commented 5 months ago

This should be fixed in the devel version now

GabrielHoffman commented 5 months ago

Thanks for the bug fix. It works now, but I get the following warning:

> sce = readH5AD(file, use_hdf5 = TRUE)
sys:1: FutureWarning: Index.format is deprecated and will be removed in a future version. Convert using index.astype(str) or index.map(formatter) instead.
> sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /hpc/packages/minerva-centos7/R/4.3.3/lib64/R/lib/libRblas.so 
LAPACK: /hpc/packages/minerva-centos7/R/4.3.3/lib64/R/lib/libRlapack.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods  
[8] base     

other attached packages:
 [1] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
 [3] Biobase_2.62.0              GenomicRanges_1.54.1       
 [5] GenomeInfoDb_1.38.8         IRanges_2.36.0             
 [7] S4Vectors_0.40.2            BiocGenerics_0.48.1        
 [9] MatrixGenerics_1.14.0       matrixStats_1.2.0          
[11] zellkonverter_1.13.3       

loaded via a namespace (and not attached):
 [1] Matrix_1.6-5            jsonlite_1.8.8          compiler_4.3.3         
 [4] crayon_1.5.2            filelock_1.0.3          Rcpp_1.0.12            
 [7] rhdf5filters_1.14.1     bitops_1.0-7            parallel_4.3.3         
[10] png_0.1-8               reticulate_1.35.0       lattice_0.22-5         
[13] XVector_0.42.0          S4Arrays_1.2.1          DelayedArray_0.28.0    
[16] GenomeInfoDbData_1.2.11 rlang_1.1.3             HDF5Array_1.30.1       
[19] dir.expiry_1.10.0       SparseArray_1.2.4       cli_3.6.2              
[22] withr_3.0.0             Rhdf5lib_1.24.2         zlibbioc_1.48.2        
[25] grid_4.3.3              basilisk_1.14.3         rhdf5_2.46.1           
[28] abind_1.4-5             RCurl_1.98-1.14         basilisk.utils_1.14.1  
[31] tools_4.3.3 
GabrielHoffman commented 5 months ago

This is due to an issue with reticulate see https://github.com/rstudio/reticulate/issues/1537

Installing reticulate 1.35.0.9000 from GitHub resolves this issue.