jinworks / CellChat

R toolkit for inference, visualization and analysis of cell-cell communication from single-cell and spatially resolved transcriptomics
GNU General Public License v3.0
208 stars 25 forks source link

Is there a way to speed up cell chat. #167

Open micosacak opened 3 weeks ago

micosacak commented 3 weeks ago

I know there is a question like that but the does not solve the problem. It takes more than 11 hours to run the following function:

getCellChat = function(seurat_object, the.ccDB, the_label_column = "seurat_clusters"){
    # the.ccDB is CellChatDB.
    # the_label_colum is a column name from meta.data of Seurat object.
    options(stringsAsFactors = FALSE)
    plan("multisession", workers = 64)
    future::plan("multisession", workers = 64) # do parallel
    data.input = as(seurat_object@assays$RNA$data, "CsparseMatrix")
    meta = seurat_object@meta.data 
    meta$labels = meta[,the_column]
    cell.use = rownames(meta)
    data.input = data.input[, cell.use]
    meta = meta[cell.use, ]
    #unique(meta$labels) # check the cell labels
    the.cellChat <- createCellChat(object = data.input, meta = meta, group.by = "labels")
    #
    the.cellChat <- addMeta(the.cellChat, meta = meta)
    the.cellChat <- setIdent(the.cellChat, ident.use = "labels") # set "labels" as default cell identity
    #levels(the.cellChat@idents) # show factor levels of the cell labels
    groupSize <- as.numeric(table(the.cellChat@idents)) # number of cells in each cell group
    the.cellChat@DB <- the.ccDB
    #
    # subset the expression data of signaling genes for saving computation cost
    the.cellChat <- subsetData(the.cellChat) # This step is necessary even if using the whole database
    the.cellChat <- identifyOverExpressedGenes(the.cellChat)
    the.cellChat <- identifyOverExpressedInteractions(the.cellChat)
    the.cellChat <- computeCommunProb(the.cellChat, type = "triMean")
    the.cellChat <- filterCommunication(the.cellChat, min.cells = 10)
    the.cellChat <- computeCommunProbPathway(the.cellChat)
    #
    the.cellChat <- aggregateNet(the.cellChat)
    the.cellChat <- netAnalysis_computeCentrality(the.cellChat, slot.name = "netP")
    return(the.cellChat)
}

I have 64 CPUs and 500 GB RAM. Especially these 2 (computeCommunProb and computeCommunProbPathway) take a lot of time. I have around 11000 cells.

sqjin commented 3 weeks ago

@micosacak How many cell groups are there in the data? I feel like you should set future::plan("multisession", workers = **4**)

micosacak commented 3 weeks ago

I have 133 cell groups. what happens if we set workers = 4? Does It use 64 CPUs into 16 parallel computations, each with 4CPU?

sqjin commented 3 weeks ago

@micosacak This is indeed a lot of cell groups. When setting workers = 4, it will run with 4 parallel computations.

micosacak commented 3 weeks ago

I have 64CPUs? Then, I must set workers = 64 or workers = 4? That is still not clear for me?