smorabit / hdWGCNA

High dimensional weighted gene co-expression network analysis
https://smorabit.github.io/hdWGCNA/
Other
317 stars 31 forks source link

Subsampling dataset for TestSoftPowers #152

Closed Thapeachydude closed 8 months ago

Thapeachydude commented 8 months ago

Hi,

is it possible to subset the dataset (e.g. by randomly sampling 25-50% of the dataset) after NormalizeMetacells(), before TestSoftPowers()?

I'm asking because for larger datasets (> 100k cells) with multiple grouping variables (cell type, sample, condition) the memory demand increases quite a bit and I'm getting OOM kills. E.g. I've tried running 120'000 cells (3 grouping variables) with 25 cores, each with 15GB memory and get a kill after 20h (usage of both cores and memory is 99%).

Hence it would be very convenient if one could simply sample from the dataset and still get a reasonable softpower estimate.

Happy about any feedback and alternative suggestions : )

Cheers, M

smorabit commented 8 months ago

Yes it is possible to subset the data. You can do something like this, just a quick example of subsetting 500 metacells from your metacell seurat object.

m_obj <- GetMetacellObject(seurat_obj)
m_obj <- m_obj[,sample(colnames(m_obj), 500)]
seurat_obj <- SetMetacellObject(seurat_obj, m_obj)

In practice you might want to do this sampling more intelligently, like stratifying by your biological samples or something. Hope this helps!

Thapeachydude commented 8 months ago

Cool, thanks a lot. I assume this is after normalizing metacells?

smorabit commented 8 months ago

You could do this before or after normalizing but you have to normalize at some point, I don't think it would make a difference.