mbernste / SpatialCorr

SpatialCorr: Identify gene sets with spatially varying correlation
MIT License
14 stars 4 forks source link

MemoryError: Unable to allocate 932 GiB for an array with shape (353762, 353762) #5

Closed bmill3r closed 1 year ago

bmill3r commented 1 year ago

Hello,

First, thanks for this great tool!

I am trying to apply it to a somewhat large single cell resolution vizgen dataset of human lung cancer (Vizgen_Human_LungCancer_Patient1) from the MERSCOPE FFPE Human Immuno-oncology data release. The dataset contains 353762 cells. When I try to apply different SpatialCorr functions, for example: spatialcorr.wrappers.kernel_diagnostics()

I receive the following error: MemoryError: Unable to allocate 932. GiB for an array with shape (353762, 353762) and data type float64.

I think the function is generating a spatial graph of (353762, 353762) and requires 932. GiB of memory, which is way more than what I have available, even on an HPC node. Note that if I subset the data to about 25%, I get a similar same error: MemoryError: Unable to allocate 64.8 GiB for an array with shape (93270, 93270) and data type float64

Have you come across this error before, and if so, do you have any suggestions to overcome it? Is there some way to read/write and store the data more efficiently so that I can apply SpatialCorr to single cell resolution datasets? Squidpy seems to be able to handle datasets of this size, so maybe there is something they do that SpatialCorr can also incorporate?

Thanks! Brendan

mbernste commented 1 year ago

Hi Brendan,

Thanks for the question and sorry for the trouble! We developed SpatialCorr primarily to be used on 10x Visium where the number of spots/cells is much fewer. Unfortunately, given the current implementation, I believe that the method will not scale to datasets of the size that MERSCOPE provides given how many cells it has. The reason why it doesn't scale is that it currently computes a full kernel matrix to estimate spotwise correlations, which of size NxN where N is the number of cells. So the issue isn't storing the data itself, but rather the calculations needed to estimate spot-wise correlations.

My suggestion would be to apply SpatialCorr to only small regions of the sample that you are interested in testing for spatial variation in correlation.

All of this being said, I do think it will be possible to change SpatialCorr to scale to larger samples by changing how the kernel estimation is implemented. Not sure when I will be able to get to it in the near future though.

bmill3r commented 1 year ago

Hi @mbernste,

Thanks for the prompt response! That makes sense and thanks for clarifying about the kernel estimation. If you do ever get around to adapting SpatialCorr towards larger samples I would love to know about it but in the meantime I'll stick to smaller regions of interest.

Have a great weekend, Brendan