Closed s2hui closed 3 years ago
I would recommend integrating only the variable genes rather than all genes, this should substantially reduce the memory requirements. Typically the integrated data is used to compute a new PCA, in which case you only need the variable genes.
Thanks your suggestion worked!
Hi, I have a follow-up question regarding this: which function & argument should I adjust to integrate only variable genes for SCTransform?
I have a similar code as @s2hui had
list <- lapply(X = list, FUN = SCTransform, method="glmGamPoi", residual.features = )
features <- SelectIntegrationFeatures(object.list = list, nfeatures = 3000)
list <- PrepSCTIntegration(object.list = list, anchor.features = features)
list <- lapply(X = list, FUN = RunPCA, features = features)
anchors<- FindIntegrationAnchors(object.list = list, anchor.features = features,
normalization.method = "SCT", reduction = "rpca", k.anchor = 5)
combo <- IntegrateData(anchorset = anchors, normalization.method = "SCT", dims = 1:30)
Hello,
I have a 55 single cell data sets I would like to integrate (consisting of over 200K cells). Each data sets belongs to 1 of 6 histologies in the disease we are studying.
An initial rpca integration using 6 reference ran out of memory (running with 1T mem, 1 node, 1 core).
Previously, I had integrated successfully using rpca with 25 data sets (100K cells, 1 histology) (running with 180G mem, 1 node, 1 core).
Rough code:
I am wondering is there is anything I can do to address the memory (cholmod problem too large) issue. For example I use 6 data set references (~30K cells), but maybe I should reduce this?
I have noticed that others have run integration successfully on 500K cells #3889 (albeit integrating 2 data sets).
Thanks for any insight, shui