Closed GabrielHoffman closed 8 months ago
This solved the issue:
sce = readH5AD(file, use_hdf5=TRUE, verbose=TRUE, version='0.8.0')
Since there are multiple versions of AnnData
and H5AD
, can you add a compatibility check?
Gabriel
Ok, that's interesting. Using a later anndata version should return the same object so I'm not sure what's happening there. Would it be possible to share a small test file where this happens? That would be really helpful for checking things.
What kinds of checks are you suggesting? I'm not quite sure I understand.
Here is the file I get the error on: https://www.synapse.org/#!Synapse:syn51188644
As far as a check, I was thinking that you can detect the AnnData version the h5ad was written with, and then tell the user if there is a compatibility issue with the read and write versions. Or maybe that's not the issue
Gabriel
Would it be possible to create a smaller example file? That file is too big to test on my laptop and I can't figure out all the authentication etc. required to download it to a server.
Here is a 70 mb subset: https://ghoffman-cdn.s3.us-east-2.amazonaws.com/dreamlet_analysis/data/PsychAD_r0_Dec_28_2022_subset.h5ad
I extracted the subset and did some testing:
library(zellkonverter)
# original file
file = "PsychAD_r0_Dec_28_2022.h5ad"
# read failure failure
sce = readH5AD(file, use_hdf5=TRUE)
# Warning message:
# 'X' matrix does not support transposition and has been skipped
# success
sce = readH5AD(file, use_hdf5=TRUE, version="0.8.0")
# write subset
# changing compression doesn't affect results
writeH5AD(sce[seq(1000), seq(20000)], file="PsychAD_r0_Dec_28_2022_subset.h5ad", compression="lzf")
# Read failure
sce2 = readH5AD("PsychAD_r0_Dec_28_2022_subset.h5ad", use_hdf5=TRUE)
# Warning message:
# 'X' matrix does not support transposition and has been skipped
# No data read in
sum(assay(sce2, 1))
# gives 0
# success
sce2 = readH5AD("PsychAD_r0_Dec_28_2022_subset.h5ad", use_hdf5=TRUE, version="0.8.0")
This should be fixed in the devel version now
Thanks for the bug fix. It works now, but I get the following warning:
> sce = readH5AD(file, use_hdf5 = TRUE)
sys:1: FutureWarning: Index.format is deprecated and will be removed in a future version. Convert using index.astype(str) or index.map(formatter) instead.
> sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /hpc/packages/minerva-centos7/R/4.3.3/lib64/R/lib/libRblas.so
LAPACK: /hpc/packages/minerva-centos7/R/4.3.3/lib64/R/lib/libRlapack.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices datasets utils methods
[8] base
other attached packages:
[1] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
[3] Biobase_2.62.0 GenomicRanges_1.54.1
[5] GenomeInfoDb_1.38.8 IRanges_2.36.0
[7] S4Vectors_0.40.2 BiocGenerics_0.48.1
[9] MatrixGenerics_1.14.0 matrixStats_1.2.0
[11] zellkonverter_1.13.3
loaded via a namespace (and not attached):
[1] Matrix_1.6-5 jsonlite_1.8.8 compiler_4.3.3
[4] crayon_1.5.2 filelock_1.0.3 Rcpp_1.0.12
[7] rhdf5filters_1.14.1 bitops_1.0-7 parallel_4.3.3
[10] png_0.1-8 reticulate_1.35.0 lattice_0.22-5
[13] XVector_0.42.0 S4Arrays_1.2.1 DelayedArray_0.28.0
[16] GenomeInfoDbData_1.2.11 rlang_1.1.3 HDF5Array_1.30.1
[19] dir.expiry_1.10.0 SparseArray_1.2.4 cli_3.6.2
[22] withr_3.0.0 Rhdf5lib_1.24.2 zlibbioc_1.48.2
[25] grid_4.3.3 basilisk_1.14.3 rhdf5_2.46.1
[28] abind_1.4-5 RCurl_1.98-1.14 basilisk.utils_1.14.1
[31] tools_4.3.3
This is due to an issue with reticulate
see https://github.com/rstudio/reticulate/issues/1537
Installing reticulate
1.35.0.9000 from GitHub resolves this issue.
I'm still having this issue after trying multiple R reinstallation tries (source and conda) on my CentOS 7.9.2009 cluster. It's worked fine for months, but I must have updated something by mistake, and now I can't get it to work again.
This means I can't read the single cell counts into memory. This is a major step in my dreamlet package workflow.
Suggestions or fixes?
Best, Gabriel