jianhuupenn / SpaGCN

SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network
MIT License
199 stars 60 forks source link

Error: Unable to allocate 932. GiB for an array with shape (353762, 353762) #72

Open bmill3r opened 1 year ago

bmill3r commented 1 year ago

Hello,

First, thanks for this great tool!

I am trying to apply it to a somewhat large single cell resolution vizgen dataset of human lung cancer (Vizgen_Human_LungCancer_Patient1) from the MERSCOPE FFPE Human Immuno-oncology data release. The dataset contains 353762 cells. When I reach the step to calculate the adjacency matrix between cells: adj=spg.calculate_adj_matrix()

I receive the following error: MemoryError: Unable to allocate 932. GiB for an array with shape (353762, 353762) and data type float64.

Presumably generating a weight matrix of (353762, 353762) requires 932. GiB of memory, which is way more than what I have available, even on an HPC node. Note that if I subset the data to about 25%, I get a similar same error: MemoryError: Unable to allocate 64.8 GiB for an array with shape (93270, 93270) and data type float64

Have you come across this error before, and if so, do you have any suggestions to overcome it? Is there some way to read/write and store the data more efficiently so that I can apply SpaGCN to single cell resolution datasets? Squidpy seems to be able to handle datasets of this size, so maybe there is something they do that SpaGCN can also incorporate?

Thanks! Brendan