sqjin / CellChat

R toolkit for inference, visualization and analysis of cell-cell communication from single-cell data
GNU General Public License v3.0
627 stars 142 forks source link

Running CellChat with a large dataset #244

Open daccachejoe opened 3 years ago

daccachejoe commented 3 years ago

Hi, I am trying to run CellChat with a large dataset ~100,000 cells but the 'computeCommunProb' step repeatedly runs into memory issues. The object is already downsampled by identity class within Seurat down from 500,000 cells so I would like to not have to downsample the object further.

Do any options exist within the pipeline to decrease the memory requirement and make the algorithm more scalable? I am running the analysis on 120 cores with 400 GB of memory available.

Thanks!

sqjin commented 3 years ago

@jad362 Thanks for pointing this issue. I think the reason is due to the calculation of mean value per cell group. Can you try the following

1) Can you check if you have do.fast = TRUE option in the function computeCommunProb? If not, please update your cellchat package. 2) No parallel. do not run future::plan("multiprocess", workers = 4). 3) set nboot = 20, which will run few times of permutation

daccachejoe commented 3 years ago

Hi @sqjin thanks for your quick reply. I originally had do.fast = TRUE and also tried running not in parallel and those changes did not solve the issue, but setting nboot = 20 had it work right away, so thanks! Could you provide some insight on the drawbacks of decreasing that permutation parameter and how much confidence can I continue to have in the results?

Thanks

daccachejoe commented 3 years ago

Also this is an unrelated question, but is there a method to group cells by a meta data variable not used for the analysis itself? For example create a summary plot of myeloid cell interactions as a whole with Endothelial cells when the analysis itself was run on more specific meta data variables?

thanks!

sqjin commented 3 years ago

@jad362 Please check the tutorial on group.cellType <- c(rep("FIB", 4), rep("DC", 4), rep("TC", 4)) group.cellType <- factor(group.cellType, levels = c("FIB", "DC", "TC")) object.list <- lapply(object.list, function(x) {mergeInteractions(x, group.cellType)}) cellchat <- mergeCellChat(object.list, add.names = names(object.list))

daccachejoe commented 3 years ago

Thanks, that works! Another note- when performing a Bonferroni correction on the ligand-receptor results (as stored in cellchat@net) would I be dividing by the total number of ligand-receptor pairs in the database I used or would I divide by the number of interactions I was modeling (i.e. the total number of cell type combinations)?

Thanks!

sqjin commented 3 years ago

@jad362 I am thinking it should be the total number of L-R pairs, which is similar to differential expression analysis, where you divide by the total number of genes.

For your last question 'but setting nboot = 20 had it work right away, so thanks! Could you provide some insight on the drawbacks of decreasing that permutation parameter and how much confidence can I continue to have in the results?'

I think the results will not change too much. If nboot = 100, then thresh = 0.05 means there are five permuations having larger communication probabilities. If nboot = 20, then thresh = 0.05 means there are one permutation having larger communication pprobabilities.

Fatomk11295 commented 1 year ago

@sqjin Can you please explain this code for me ? I am new to r and i cant understand how to reproduce this line of code to my data

group.cellType <- c(rep("FIB", 4), rep("DC", 4), rep("TC", 4)) group.cellType <- factor(group.cellType, levels = c("FIB", "DC", "TC")) object.list <- lapply(object.list, function(x) {mergeInteractions(x, group.cellType)}) cellchat <- mergeCellChat(object.list, add.names = names(object.list))

sukks105 commented 1 year ago

@sqjin I am trying to do differential number of interactions for clusters A, B, C and D in two different conditions. I have a question on why you used "4" in the code group.cellType <- c(rep("FIB", 4), rep("DC", 4), rep("TC", 4))? I got following error when i used "4".

My code : group.cellType <- c(rep("A", 4), rep("B",4), rep("C",4), rep("D",4)) group.cellType <- factor(group.cellType, levels = c("A", "B", "C", "D")) object.list <- lapply(object.list, function(x) {mergeInteractions(x, group.cellType)})

Error: Error in count[group.merged == i, group.merged == j] : (subscript) logical subscript too long

However, when i used 1 or 2 instead of 4 in group.cellType <- c(rep("A", 1), rep("B",1), rep("C",1), rep("D",1)) or group.cellType <- c(rep("A", 2), rep("B",2), rep("C",2), rep("D",2)) i no longer get errror but got completely different results.

Would you please clarify on this?

sqjin commented 1 year ago

@sukks105 My data have four subclusters of FIB and thus group them into one cell type.

sukks105 commented 1 year ago

@sukks105 My data have four subclusters of FIB and thus group them into one cell type.

Thanks a lot for the clarification!

luongthang1908 commented 8 months ago

@jad362 I am thinking it should be the total number of L-R pairs, which is similar to differential expression analysis, where you divide by the total number of genes.

For your last question 'but setting nboot = 20 had it work right away, so thanks! Could you provide some insight on the drawbacks of decreasing that permutation parameter and how much confidence can I continue to have in the results?'

I think the results will not change too much. If nboot = 100, then thresh = 0.05 means there are five permuations having larger communication probabilities. If nboot = 20, then thresh = 0.05 means there are one permutation having larger communication pprobabilities.

Hi @sqjin, can I keep nboot = 100 and process the computeCommunProb in parallel? Thanks,

sqjin commented 8 months ago

@luonthang1908 Yes, you can. You can also perform subsampling before running cellchat