meiosis97 / Sincast

This is the very first Sincast version!
8 stars 2 forks source link

big metadata #2

Open jiadongm opened 6 months ago

jiadongm commented 6 months ago

I was using sincastImp to impute a scRNA-seq data with >30,000 cells and 2000 genes. I realised that the imputed sce object became incredibly large (>20G). Initially I thought this was due to the imputed gene expression matrix being dense, hence taking up a lot of space to store.

Then I subsetted the imputed sce object down to ~1,500 cells, but the sce object still took up to 20G space. After a lot of trial and errors I realised this is because the metadata(sce) stored huge matrices, including a 30,000 x 30,000 distance matrix, and hence was very very large.

I was wondering if this could be fixed to make the pipeline more memory efficient? One possibility is to not to create metadata by default?

meiosis97 commented 5 months ago

Thank you for your feedback. The problem is due to that Sincast Imputation calculates a huge dense distance matrix that is very memory inefficient. We suggest that you works on a smaller subset of cells for now. At the meantime we are working on fixing the issue.