theislab / zellkonverter

Conversion between scRNA-seq objects
https://theislab.github.io/zellkonverter/
Other
144 stars 27 forks source link

High RAM peak while loading h5ad. #92

Open ilibarra opened 1 year ago

ilibarra commented 1 year ago

Thanks you for dev of this package!

We're using this package for loading the open problems h5ad file in a tutorial from the best practices book (70k cells, 130k features).

While loading the processed NeurIPs dataset (~3.0 GB) with zellkonverter 1.8.0 in R, one gets a peak RAM usage of more than 30 GB. This is exceptional and would be hard to execute in most local hardware. In comparison, using Python's and anndata 0.8.0, the total memory increase after loading does not increase more than 10GB.

library(zellkonverter)
sce <- readH5AD("oproblems_bmmc_multiome_genes_filtered.h5ad")

I am suspecting that sparse to dense matrix conversions are generating this mem increase, but it could be something else. In general, I am asking whether there are flags that can be applied, or proposed, while loading such objects at the readH5AD step, to avoid this peak memory usage.

GEO - Dataset Processed

R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

[21] SingleCellExperiment_1.20.0       zellkonverter_1.8.0     

Thank you,

lazappi commented 1 year ago

Here are some suggestions/comments but I'm not sure how much they will help:

Let me know if any of that is helpful