Open jiadongm opened 6 months ago
Thank you for your feedback. The problem is due to that Sincast Imputation calculates a huge dense distance matrix that is very memory inefficient. We suggest that you works on a smaller subset of cells for now. At the meantime we are working on fixing the issue.
I was using sincastImp to impute a scRNA-seq data with >30,000 cells and 2000 genes. I realised that the imputed sce object became incredibly large (>20G). Initially I thought this was due to the imputed gene expression matrix being dense, hence taking up a lot of space to store.
Then I subsetted the imputed sce object down to ~1,500 cells, but the sce object still took up to 20G space. After a lot of trial and errors I realised this is because the metadata(sce) stored huge matrices, including a 30,000 x 30,000 distance matrix, and hence was very very large.
I was wondering if this could be fixed to make the pipeline more memory efficient? One possibility is to not to create metadata by default?