tansey-lab / bayestme

BayesTME: A reference-free Bayesian method for analyzing spatial transcriptomics data
Other
14 stars 3 forks source link

Out of Memory Error During Bleed Correction on CytAssist Capture Area #200

Open francescocister opened 6 days ago

francescocister commented 6 days ago

Dear Team, First, thank you for providing this nice method! We are encountering an issue when using the bleed correction function on the CytAssist capture area, which has approximately 14,000 spots (including background). We have attempted to subset the data by selecting a random set of background spots (about 2,500 spots) and setting n_top = 10. However, we consistently encounter an "out of memory" error. System Details: Number of CPUs: 1 RAM: 512 GB

Could you please provide guidance on how to resolve or work around this memory issue?

Thank you for your support!

Best regards, Francesco

jeffquinn-msk commented 6 days ago

Hi Francesco,

Thanks for writing in. Sounds like there must be some non-linear complexity in the number of spots in our implementation of this. I don't think we have personally tried this on the larger 14,000 spot slides (they didn't exist yet when we wrote the method). I can run it through the memory profiler and try and see what's going on.

Cheers,

Jeff

EmanuelSoda commented 3 days ago

Hi @jeffquinn-msk, which is the maximum numbers of spots on which you have tried it?

Thanks, Emanuel

jeffquinn-msk commented 3 days ago

Personally I've only run this on the 6.5mm / 4992 spot Visium datasets. Going to grab one of the public 11mm datasets now and see whats going on.

Cheers,

Jeff

jeffquinn-msk commented 2 days ago

Just for starters I ran the pipeline on 11mm samples from 10x public datasets, it worked fine, but did peak at 60GB memory usage for the bleed correction which is obviously not ideal.

Screenshot 2024-10-25 at 12 47 05 PM

If your system @francescocister has 512GB of memory I'm not sure why it would have failed.. were you using the nextflow pipeline I provide or running it another way?

francescocister commented 2 days ago

Hi @jeffquinn-msk, thank you very much for helping us in this! I just adapted the notebook you provided as tutorial to our samples, I don't think that would make any difference (does it?). Does the number of HVGs affect the memory usage in the bleeding correction step? What threshold are you using? I guess it's not the case as that should be controlled by the n_top parameter, but I can try to reduce the number of genes...

jeffquinn-msk commented 2 days ago

I'm trying to figure out now which parameters affect memory, it seems like number of spots is the real issue (sorry I'm not the original author of this component so I'm learning here).

When you copied the code from those demo notebooks, did you use the exact same command to install the package, pip install git+https://github.com/tansey-lab/bayestme@6cb143a ? This is pinned to an older version of the code, we have made a lot of improvements since then, that could be the main issue here.

The best and easiest way to run bayestme is the nextflow workflow: https://bayestme.readthedocs.io/en/latest/nextflow.html

jeffquinn-msk commented 2 days ago

Ok so what we have determined is the bleed correction algorithm has memory complexity O(N^2 * B). N is the total number of spots and B is the number of basis functions (we use 4). So for 11k spots that gets into the 60GB of memory range.

This is unfortunately not something that will be trivial to improve upon short term, that being said I was able to run it on our cluster on the 11mm visium slides, the runtime was about 15h for bleeding correction but it did eventually work, so thats one option if you can just run it overnight on a cluster. Another option for now would be to divide the slide into tiles and apply the correction to each independently.