Closed NicoNiCoN11 closed 8 months ago
Hi, there are a couple of things you can do to reduce the memory footprint. Usually float32 should be more than sufficient for transcriptomics data. And usually we recommend removing unexpressed genes and selecting e.g. highly variable genes to reduce noise in the dataset. 60K genes is quite a lot, we usually work in the ballpark of 2k-10k genes. And finally make sure you're using sparse matrices for a reduced memory and storage footprint.
If these don't work, please consider using the combat function implemented in scanpy https://scanpy.readthedocs.io/en/latest/api/generated/scanpy.pp.combat.html
I use jupyter notebook to conduct this process. I want to integrate my dataset with Combat, but every time I run this combat, the memory usage will be more than 300G sometimes more than 400G, and this error occured may because of too much memory usage, my dataset contain 26000 cells and 50000 features. the h5ad file is about 11.19GB , the memory of my remote server is about 450G, the error said "MemoryError: Unable to allocate 121. GiB for an array with shape (26734, 606219) and data type float64"