big metadata - Githubissues

I was using sincastImp to impute a scRNA-seq data with >30,000 cells and 2000 genes. I realised that the imputed sce object became incredibly large (>20G). Initially I thought this was due to the imputed gene expression matrix being dense, hence taking up a lot of space to store.

Then I subsetted the imputed sce object down to ~1,500 cells, but the sce object still took up to 20G space. After a lot of trial and errors I realised this is because the metadata(sce) stored huge matrices, including a 30,000 x 30,000 distance matrix, and hence was very very large.

I was wondering if this could be fixed to make the pipeline more memory efficient? One possibility is to not to create metadata by default?

meiosis97 / Sincast

big metadata #2