Open francescocister opened 6 days ago
Hi Francesco,
Thanks for writing in. Sounds like there must be some non-linear complexity in the number of spots in our implementation of this. I don't think we have personally tried this on the larger 14,000 spot slides (they didn't exist yet when we wrote the method). I can run it through the memory profiler and try and see what's going on.
Cheers,
Jeff
Hi @jeffquinn-msk, which is the maximum numbers of spots on which you have tried it?
Thanks, Emanuel
Personally I've only run this on the 6.5mm / 4992 spot Visium datasets. Going to grab one of the public 11mm datasets now and see whats going on.
Cheers,
Jeff
Just for starters I ran the pipeline on 11mm samples from 10x public datasets, it worked fine, but did peak at 60GB memory usage for the bleed correction which is obviously not ideal.
If your system @francescocister has 512GB of memory I'm not sure why it would have failed.. were you using the nextflow pipeline I provide or running it another way?
Hi @jeffquinn-msk, thank you very much for helping us in this! I just adapted the notebook you provided as tutorial to our samples, I don't think that would make any difference (does it?). Does the number of HVGs affect the memory usage in the bleeding correction step? What threshold are you using? I guess it's not the case as that should be controlled by the n_top parameter, but I can try to reduce the number of genes...
I'm trying to figure out now which parameters affect memory, it seems like number of spots is the real issue (sorry I'm not the original author of this component so I'm learning here).
When you copied the code from those demo notebooks, did you use the exact same command to install the package, pip install git+https://github.com/tansey-lab/bayestme@6cb143a
? This is pinned to an older version of the code, we have made a lot of improvements since then, that could be the main issue here.
The best and easiest way to run bayestme is the nextflow workflow: https://bayestme.readthedocs.io/en/latest/nextflow.html
Ok so what we have determined is the bleed correction algorithm has memory complexity O(N^2 * B). N is the total number of spots and B is the number of basis functions (we use 4). So for 11k spots that gets into the 60GB of memory range.
This is unfortunately not something that will be trivial to improve upon short term, that being said I was able to run it on our cluster on the 11mm visium slides, the runtime was about 15h for bleeding correction but it did eventually work, so thats one option if you can just run it overnight on a cluster. Another option for now would be to divide the slide into tiles and apply the correction to each independently.
Dear Team, First, thank you for providing this nice method! We are encountering an issue when using the bleed correction function on the CytAssist capture area, which has approximately 14,000 spots (including background). We have attempted to subset the data by selecting a random set of background spots (about 2,500 spots) and setting n_top = 10. However, we consistently encounter an "out of memory" error. System Details: Number of CPUs: 1 RAM: 512 GB
Could you please provide guidance on how to resolve or work around this memory issue?
Thank you for your support!
Best regards, Francesco