Open sevahn opened 3 years ago
Do you need to use combat? If suggest trying sc.external.pp.bbknn()
There is a really nice version of combat from the sva package that includes a reference batch. If this was added as a feature then you could perform your corrections separately for each sample. This might be a pretty easy addition.
I'm wondering if it would be possible to make a gradient descent based version of COMBAT or similar. It would involve some level of benchmarking, but presumably you would be able to get past the memory issue by streaming data in batches, letting the final weights and correction being informed by the whole data while not needing all of it in memory at once. Could possibly implement with a pytorch backend.
I have a dataset with around 400K observations -- I wanted to perform batch correction using sc.pp.combat, but I'm getting out of memory errors after running for a couple hours with > 2 TB of memory.
My understanding was that combat used a dense matrix, which requires a lot of memory. Why is this? Are there suggestions for workarounds here?