theislab / scib

Benchmarking analysis of data integration tools
MIT License
283 stars 62 forks source link

why the "Unable to allocate 121. GiB for an array with shape (26734, 606219) and data type float64 every time I run "scib.integration.combat(adata, batch= "sample")" #399

Closed NicoNiCoN11 closed 4 months ago

NicoNiCoN11 commented 4 months ago

I use jupyter notebook to conduct this process. I want to integrate my dataset with Combat, but every time I run this combat, the memory usage will be more than 300G sometimes more than 400G, and this error occured may because of too much memory usage, my dataset contain 26000 cells and 50000 features. the h5ad file is about 11.19GB , the memory of my remote server is about 450G, the error said "MemoryError: Unable to allocate 121. GiB for an array with shape (26734, 606219) and data type float64"

Screenshot 2024-03-18 at 17 11 03 Screenshot 2024-03-18 at 17 03 51 Screenshot 2024-03-18 at 17 08 31
mumichae commented 4 months ago

Hi, there are a couple of things you can do to reduce the memory footprint. Usually float32 should be more than sufficient for transcriptomics data. And usually we recommend removing unexpressed genes and selecting e.g. highly variable genes to reduce noise in the dataset. 60K genes is quite a lot, we usually work in the ballpark of 2k-10k genes. And finally make sure you're using sparse matrices for a reduced memory and storage footprint.

If these don't work, please consider using the combat function implemented in scanpy https://scanpy.readthedocs.io/en/latest/api/generated/scanpy.pp.combat.html