sc.pp.combat runtime/out of memory

scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.

https://scanpy.readthedocs.io

BSD 3-Clause "New" or "Revised" License

1.9k stars 600 forks source link

sc.pp.combat runtime/out of memory #1977

Open sevahn opened 3 years ago

sevahn commented 3 years ago

I have a dataset with around 400K observations -- I wanted to perform batch correction using sc.pp.combat, but I'm getting out of memory errors after running for a couple hours with > 2 TB of memory.

My understanding was that combat used a dense matrix, which requires a lot of memory. Why is this? Are there suggestions for workarounds here?

chris-rands commented 3 years ago

Do you need to use combat? If suggest trying sc.external.pp.bbknn()

rbf22 commented 2 years ago

There is a really nice version of combat from the sva package that includes a reference batch. If this was added as a feature then you could perform your corrections separately for each sample. This might be a pretty easy addition.

whitleyo commented 8 months ago

I'm wondering if it would be possible to make a gradient descent based version of COMBAT or similar. It would involve some level of benchmarking, but presumably you would be able to get past the memory issue by streaming data in batches, letting the final weights and correction being informed by the whole data while not needing all of it in memory at once. Could possibly implement with a pytorch backend.