satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.28k stars 909 forks source link

"RunPCA" function occupies all threads #8363

Closed Tongbcc closed 7 months ago

Tongbcc commented 9 months ago

When running the 'RunPCA()' function, it occupies all remaining threads in the system. Since I need to calculate a large number of PCs, the computation takes a long time, which is not allowed by our regulations. When I use options(mc.cores = 15) and

library(future) 
plan("multisession", workers=15)

neither of these methods is effective. The following is the origin codes:

args <- commandArgs(trailingOnly = TRUE)
if(length(args)!=1){
  print("Usage:")
  print("        Rscript extract_PCs.R RDS")
  q(save = "no")
}

library(Seurat)
library(dplyr)

# Set the maximum number of threads to 15
options(mc.cores = 15)

data <- readRDS(args[1])
data <- RunPCA(data, npcs = 800)
png("elbow.png")
ElbowPlot(data, ndims = 800)
dev.off()
dcollins15 commented 8 months ago

Have you tried passing approx = FALSE to RunPCA? It's possible that the irlba is doing some parallelization under the hood that isn't subject to mc.cores or future::plan.

As an aside, it's pretty striking that you are trying to find 800 principal components - why so many?

Tongbcc commented 8 months ago

Have you tried passing approx = FALSE to RunPCA? It's possible that the irlba is doing some parallelization under the hood that isn't subject to mc.cores or future::plan.

As an aside, it's pretty striking that you are trying to find 800 principal components - why so many?

Thank you for your suggestion. I have tried passing approx = FALSE to RunPCA, but it didn't work. I chose to use 800 PCs based on the method described by Mah, J.L. & Dunn, C.W. in their paper "Cell type evolution reconstruction across species through cell phylogenies of single-cell RNA sequencing data". Using a larger number of PCs can better preserve the evolutionary relationships between cell types when constructing the phylogenetic tree of cell types using continuous variables.