sqjin / CellChat

R toolkit for inference, visualization and analysis of cell-cell communication from single-cell data
GNU General Public License v3.0
640 stars 145 forks source link

Different results from trimean and truncatedMean calculations #564

Open Smeerlap opened 1 year ago

Smeerlap commented 1 year ago

Dear developers,

thanks for creating this great package and the comprehensive, easy-to-follow vignettes! I am fairly new to bioinformatics and single cell analyses and want to use CellChat for a network analysis in inflammation, e.g. how the immune cells interact with resident cells. I am comparing single cell data from healthy wildtype mice with sick mutant mice.

I hope that I understood this correctly: If I change the parameters in the "computeCommunProb" Function from the default "trimean" to "truncatedMean", I can lower the threshold for gene expression that is taken into account for the downstream analyses. Since many of the immune-related transcripts are expected to only be expressed in a small number of cells, I lowered the "trim" for "truncatedMean" to 10 % and compared my results with the default "trimean" calculation. I was hoping you could help me at interpreting the differences in results I get comparing mutant mice with healthy wt mice:

1) Differential # and strength of interactions when calculated with "trimean"

CellChat_Differential_Heatmaps_trimean

2) Differential # and strength of interactions when calculated with "truncatedMean", "trim = 0.1"

CellChat_Differential_Heatmap_truncMean_0 1

The first heatmap was very reassuring because it nicely recapitulates what happens in the mice (podocytes and endothelial cells lose contacts within their own cell populations) as this has been shown that this happens when the disease occurs. Also mesangial cells seem to become major players in the signaling network. At least that's how I interpret the first heatmap. The second heatmap tells a different story: The number of interactions between the podocyte seems to increase rather than decrease in the sick mice, even though the strength of these signals decreases. Mesangial cells do not seem to be as central of a player as they seem to be in the first heat map.

I've had a couple of thoughts about this and was hoping you could share what you think of it:

Here is the code I ran (glom_seu = my Seurat object containing snRNAseq data subsetted for the 4 celltypes I'm looking at; Nphs2 is the genotype of the mutant mice):

# load and preprocess database
CellChatDB <- CellChatDB.mouse
CellChatDB.use <- CellChatDB 
# remove two interactions from database that would cause an error
which(CellChatDB.use[["interaction"]]$ligand == "H2-BI") # 1887
CellChatDB.use[["interaction"]] <- CellChatDB.use[["interaction"]][-1887,]
which(CellChatDB.use[["interaction"]]$ligand == "H2-Ea-ps") #1900
CellChatDB.use[["interaction"]] <- CellChatDB.use[["interaction"]][-1900,]`

# create mutant CellChat and process
cc_seu <- glom_seu %>%
    subset(subset = gtype == "Nphs2" & age == 8)
cellchat.mut <- createCellChat(object = cc_seu, group.by = "cell_group")
cellchat.mut@DB <- CellChatDB.use
cellchat.mut <- subsetData(cellchat.mut)
cellchat.mut <- identifyOverExpressedGenes(cellchat.mut)
cellchat.mut <- identifyOverExpressedInteractions(cellchat.mut)
cellchat.mut@idents <- droplevels(cellchat.mut@idents, 
                                    exclude = setdiff(levels(cellchat.mut@idents),
                                                      unique(cellchat.mut@meta$cell_group)))
cellchat.mut <- computeCommunProb(cellchat.mut, 
                                    population.size = TRUE, 
                                    type = "truncatedMean",
                                    trim = 0.1)
 cellchat.mut <- filterCommunication(cellchat.mut, 
                                      min.cells = 10)
 cellchat.mut <- computeCommunProbPathway(cellchat.mut)
 cellchat.mut <- aggregateNet(cellchat.mut)

# create wt CellChat and process
cc_seu <- glom_seu %>%
    subset(subset = gtype == "wt")

wt_cellchat <- createCellChat(object = cc_seu, group.by = "cell_group")
wt_cellchat@DB <- CellChatDB.use
wt_cellchat <- subsetData(wt_cellchat)
wt_cellchat <- identifyOverExpressedGenes(wt_cellchat)
wt_cellchat <- identifyOverExpressedInteractions(wt_cellchat)
wt_cellchat@idents <- droplevels(wt_cellchat@idents, 
                                    exclude = setdiff(levels(wt_cellchat@idents),
                                                      unique(wt_cellchat@meta$cell_group)))
wt_cellchat <- computeCommunProb(wt_cellchat, 
                                 population.size = TRUE,
                                 type = "truncatedMean",
                                 trim = 0.1)
wt_cellchat <- filterCommunication(wt_cellchat, 
                                      min.cells = 10)
wt_cellchat <- computeCommunProbPathway(wt_cellchat)
wt_cellchat <- aggregateNet(wt_cellchat)

# merge CellChat Objects
object.list <- list(wt = wt_cellchat, mut = cellchat.mut)
cellchat.merge <- mergeCellChat(object.list, add.names = names(object.list))
cellchat.merge

# create heatmaps
gg1 <- netVisual_heatmap(cellchat.merge)
gg2 <- netVisual_heatmap(cellchat.merge, measure = "weight")
gg1 + gg2

For "trimean" calculation I ran the same code, just deleting the "type = "truncatedMean", trim = 0.1" section in both pipelines.

Here is my SessionInfo()

R version 4.2.2 Patched (2022-11-10 r83330) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.5 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=de_DE.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=de_DE.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] grid parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] ComplexHeatmap_2.14.0 doParallel_1.0.17 iterators_1.0.14 foreach_1.5.2 ggalluvial_0.12.3 NMF_0.25
[7] cluster_2.1.4 rngtools_1.5.2 registry_0.5-1 patchwork_1.1.2 CellChat_1.6.1 Biobase_2.58.0
[13] BiocGenerics_0.44.0 igraph_1.3.5 data.table_1.14.6 forcats_0.5.2 stringr_1.5.0 dplyr_1.0.10
[19] purrr_1.0.1 readr_2.1.3 tidyr_1.3.0 tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2
[25] SeuratObject_4.1.3 Seurat_4.3.0

loaded via a namespace (and not attached): [1] utf8_1.2.2 spatstat.explore_3.0-6 reticulate_1.28 tidyselect_1.2.0 htmlwidgets_1.6.1
[6] BiocParallel_1.32.5 Rtsne_0.16 munsell_0.5.0 ragg_1.2.5 codetools_0.2-18
[11] ica_1.0-3 future_1.30.0 miniUI_0.1.1.1 withr_2.5.0 spatstat.random_3.1-3 [16] colorspace_2.1-0 progressr_0.13.0 knitr_1.42 rstudioapi_0.14 stats4_4.2.2
[21] ROCR_1.0-11 ggsignif_0.6.4 tensor_1.5 listenv_0.9.0 labeling_0.4.2
[26] polyclip_1.10-4 farver_2.1.1 coda_0.19-4 parallelly_1.34.0 vctrs_0.5.2
[31] generics_0.1.3 xfun_0.36 timechange_0.2.0 R6_2.5.1 clue_0.3-63
[36] spatstat.utils_3.0-1 assertthat_0.2.1 promises_1.2.0.1 scales_1.2.1 googlesheets4_1.0.1
[41] gtable_0.3.1 Cairo_1.6-0 globals_0.16.2 goftest_1.2-3 rlang_1.0.6
[46] systemfonts_1.0.4 GlobalOptions_0.1.2 splines_4.2.2 rstatix_0.7.1 lazyeval_0.2.2
[51] gargle_1.2.1 spatstat.geom_3.0-5 broom_1.0.3 BiocManager_1.30.19 yaml_2.3.7
[56] reshape2_1.4.4 abind_1.4-5 modelr_0.1.10 ggnetwork_0.5.10 backports_1.4.1
[61] httpuv_1.6.8 tools_4.2.2 gridBase_0.4-7 statnet.common_4.8.0 ellipsis_0.3.2
[66] RColorBrewer_1.1-3 ggridges_0.5.4 Rcpp_1.0.10 plyr_1.8.8 ggpubr_0.5.0
[71] deldir_1.0-6 pbapply_1.7-0 GetoptLong_1.0.5 cowplot_1.1.1 S4Vectors_0.36.1
[76] zoo_1.8-11 haven_2.5.1 ggrepel_0.9.2 fs_1.6.0 magrittr_2.0.3
[81] magick_2.7.3 RSpectra_0.16-1 sna_2.7-1 scattermore_0.8 circlize_0.4.15
[86] lmtest_0.9-40 reprex_2.0.2 RANN_2.6.1 googledrive_2.0.0 fitdistrplus_1.1-8
[91] matrixStats_0.63.0 hms_1.1.2 mime_0.12 evaluate_0.20 xtable_1.8-4
[96] readxl_1.4.1 IRanges_2.32.0 gridExtra_2.3 shape_1.4.6 compiler_4.2.2
[101] KernSmooth_2.23-20 crayon_1.5.2 htmltools_0.5.4 later_1.3.0 tzdb_0.3.0
[106] lubridate_1.9.1 DBI_1.1.3 dbplyr_2.3.0 MASS_7.3-58.2 Matrix_1.5-3
[111] car_3.1-1 cli_3.6.0 pkgconfig_2.0.3 sp_1.6-0 plotly_4.10.1
[116] spatstat.sparse_3.0-0 xml2_1.3.3 svglite_2.1.1 rvest_1.0.3 digest_0.6.31
[121] sctransform_0.3.5 RcppAnnoy_0.0.20 spatstat.data_3.0-0 rmarkdown_2.20 cellranger_1.1.0
[126] leiden_0.4.3 uwot_0.1.14 shiny_1.7.4 rjson_0.2.21 lifecycle_1.0.3
[131] nlme_3.1-161 jsonlite_1.8.4 network_1.18.1 carData_3.0-5 BiocNeighbors_1.16.0
[136] viridisLite_0.4.1 fansi_1.0.4 pillar_1.8.1 lattice_0.20-45 fastmap_1.1.0
[141] httr_1.4.4 survival_3.5-0 glue_1.6.2 FNN_1.1.3.1 png_0.1-8
[146] stringi_1.7.12 textshaping_0.3.6 irlba_2.3.5.1 future.apply_1.10.0

sqjin commented 1 year ago

@JasperNies You may consider to set population.size = FALSE as default.

"Maybe me lowering the threshol just takes into account more very weakly expressed signals in the podocyte cluster which increases the number of potential interactions, but these interactions are of low confidence / **strength." YOUR UNDERSTANDING IS CORRECT.

"Why does the total number of differential interactions with the mesangial cells being the senders" IT IS POSSIBLE THAT THERE ARE MORE INTERACTIONS IN CONTROL WHEN LOWERING THE THRESHOLD.

Smeerlap commented 1 year ago

@sqjin Thank you so much for your quick help! Would you might telling me why population.size = FALSE is the default setting anyways? My though process was to set this to TRUE since my populations vary considerably in size (immune cells are way less frequent than the other cells types, almost one order of magnitude), so I thought that this should be taken into consideration when calculating the network activities.