theislab / zellkonverter

Conversion between scRNA-seq objects
https://theislab.github.io/zellkonverter/
Other
144 stars 27 forks source link

`readH5AD()` only works after Keyboard Interruption #70

Closed saulvegasauceda closed 1 year ago

saulvegasauceda commented 2 years ago

Hello, I ran into this weird behavior when running readH5AD. For some reason, every first invocation of readH5AD stalls up to the point of displaying the anndata version. However, if I interrupt this call then the following invocations of readH5AD work as expected. Is there a way I could get this to work without the Keyboard Interruption?

I am trying to run this call for 72 h5 files so ideally I would want this to be done without any user interaction.

Thanks, Saul

Example:

> rna_sce <- readH5AD("./BD1.h5ad", X_name="counts", version =  "0.8.0", reader="python", verbose=TRUE)
ℹ Using the Python reader
ℹ Using anndata version 0.8.0
^C
> rna_sce <- readH5AD("./BD1.h5ad", X_name="counts", version =  "0.8.0", reader="python", verbose=TRUE)
ℹ Using the Python reader
ℹ Using anndata version 0.8.0
✔ Read ./BD1.h5ad [532ms]
ℹ uns is empty and was skipped
✔ X matrix converted to assay [4s]
ℹ layers is empty and was skipped
✔ var converted to rowData [87ms]
✔ obs converted to colData [46ms]
ℹ varm is empty and was skipped
ℹ obsm is empty and was skipped
ℹ varp is empty and was skipped
ℹ obsp is empty and was skipped
✔ SingleCellExperiment constructed [467ms]
ℹ Skipping conversion of raw
✔ Converting AnnData to SingleCellExperiment ... done

R session info:

R version 4.2.0 (2022-04-22)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /home/saulv/.conda/envs/scRNA_env/lib/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] basilisk_1.8.0              reticulate_1.25
 [3] zellkonverter_1.6.3         scDblFinder_1.10.0
 [5] SingleCellExperiment_1.18.0 SummarizedExperiment_1.26.1
 [7] Biobase_2.56.0              GenomicRanges_1.48.0
 [9] GenomeInfoDb_1.32.3         IRanges_2.30.0
[11] S4Vectors_0.34.0            BiocGenerics_0.42.0
[13] MatrixGenerics_1.8.1        matrixStats_0.62.0

loaded via a namespace (and not attached):
 [1] viridis_0.6.2             edgeR_3.38.4
 [3] BiocSingular_1.12.0       jsonlite_1.8.0
 [5] viridisLite_0.4.0         here_1.0.1
 [7] DelayedMatrixStats_1.18.0 scuttle_1.6.2
 [9] statmod_1.4.37            dqrng_0.3.0
[11] GenomeInfoDbData_1.2.8    vipor_0.4.5
[13] Rsamtools_2.12.0          yaml_2.3.5
[15] ggrepel_0.9.1             pillar_1.8.0
[17] lattice_0.20-45           glue_1.6.2
[19] limma_3.52.2              beachmat_2.12.0
[21] XVector_0.36.0            colorspace_2.0-3
[23] Matrix_1.4-1              XML_3.99-0.10
[25] pkgconfig_2.0.3           dir.expiry_1.4.0
[27] zlibbioc_1.42.0           purrr_0.3.4
[29] scales_1.2.0              ScaledMatrix_1.4.0
[31] BiocParallel_1.30.3       tibble_3.1.8
[33] generics_0.1.3            ggplot2_3.3.6
[35] xgboost_1.6.0.1           cli_3.3.0
[37] magrittr_2.0.3            crayon_1.5.1
[39] fansi_1.0.3               MASS_7.3-58.1
[41] bluster_1.6.0             beeswarm_0.4.0
[43] data.table_1.14.2         tools_4.2.0
[45] scater_1.24.0             BiocIO_1.6.0
[47] lifecycle_1.0.1           basilisk.utils_1.8.0
[49] locfit_1.5-9.6            munsell_0.5.0
[51] cluster_2.1.3             DelayedArray_0.22.0
[53] irlba_2.3.5               Biostrings_2.64.0
[55] compiler_4.2.0            rsvd_1.0.5
[57] rlang_1.0.4               grid_4.2.0
[59] RCurl_1.98-1.8            BiocNeighbors_1.14.0
[61] rjson_0.2.21              igraph_1.3.4
[63] bitops_1.0-7              restfulr_0.0.15
[65] gtable_0.3.0              codetools_0.2-18
[67] R6_2.5.1                  GenomicAlignments_1.32.1
[69] gridExtra_2.3             dplyr_1.0.9
[71] rtracklayer_1.56.1        utf8_1.2.2
[73] rprojroot_2.0.3           filelock_1.0.2
[75] metapod_1.4.0             ggbeeswarm_0.6.0
[77] parallel_4.2.0            Rcpp_1.0.9
[79] png_0.1-7                 scran_1.24.0
[81] vctrs_0.4.1               tidyselect_1.1.2
[83] sparseMatrixStats_1.8.0
lazappi commented 2 years ago

Hi @saulvegasauceda

Thanks for giving {zellkonverter} a go. I think what you might be interrupting is the creation of the {basilisk} Python environment. What happens if you just let it run? Does it finish eventually or just hang forever?

saulvegasauceda commented 2 years ago

I've let it run for 11 hours, it did not finish. I think it's safe to assume it would have stalled indefinitely.

LTLA commented 2 years ago

Usually the creation of the basilisk environment would be accompanied by a lot of noise and thunder from Conda. I don't see any of this in the stdout above; and besides, if this was interrupted, subsequent calls should not work.

I assume that the environment was already provisioned in the first call above. Suggest debug()ing a relevant function and stepping through to see where the stall is occurring.

saulvegasauceda commented 2 years ago

Thank you @LTLA @lazappi for responding!

It remains stalled until I interrupt it.

debug(writeH5AD(rna_data, h5_rna, verbose=TRUE))
ℹ Using anndata version 0.8.0
^C

Not sure what's causing this behavior but using the R.utils library function withTimeout() bandaged the issue. Here's what I did:

withTimeout({readH5AD(input_file, verbose = TRUE)}, timeout=120, onTimeout="silent")
rna_sce <- readH5AD(input_file, verbose = TRUE)
lazappi commented 2 years ago

I think you need to use debug() slightly differently. If you do:

debug(zellkonverter::readH5AD)
zellkonverter::readH5AD(input_file, verbose = TRUE)

That should open the debugging browser. Then you can step through the function line by line and see which line is getting stuck. Depending on where it is debug(zellkonverter::AnnData2SCE) might be more useful.

Do you have this issue on another machine and/or with different input files?

lazappi commented 1 year ago

Closing this issue as I hope it has been resolved in recent releases.