lmweber / nnSVG

nnSVG: scalable method to identify spatially variable genes (SVGs) in spatially-resolved transcriptomics data
MIT License
13 stars 9 forks source link

Memory usage / HDF5 / parallelization #2

Closed lmweber closed 2 years ago

lmweber commented 3 years ago

Currently we have high memory usage when using parallelization, since the full matrix of pre-processed expression values y is passed to each thread, so memory usage is at least object.size(y) * n_threads.

It should be possible to reduce this by adding support for HDF5 / DelayedArray objects to store y on-disk.

Code from my initial attempt is in this commit: https://github.com/lmweber/nnSVG/commit/65daf9e319d4c0a634cdb6fe20176347a0bb027b

Requires additionally setting up block processing manually to interact correctly with BiocParallel.