sqjin / CellChat

R toolkit for inference, visualization and analysis of cell-cell communication from single-cell data
GNU General Public License v3.0
621 stars 142 forks source link

Getting adjusted p values CellChat #706

Open jv-20 opened 11 months ago

jv-20 commented 11 months ago

Hi @sqjin

Couple of quick questions.

Do we need to do multiple testing to obtain adjusted pvalues / the p-value reported with cellchat identifications of interactions in the individual dataset is sufficient?

Also in terms of identifying dysregulated signalling when comparing two datasets, would you be able to advise the code to how to get adjusted pvalues? Likewise is multiple testing needed to be done here as additional step to filter out false positives?

# define a positive dataset, i.e., the dataset with positive fold change against the other dataset
pos.dataset = "LS"
# define a char name used for storing the results of differential expression analysis
features.name = pos.dataset
# perform differential expression analysis
cellchat <- identifyOverExpressedGenes(cellchat, group.dataset = "datasets", pos.dataset = pos.dataset, features.name = features.name, only.pos = FALSE, thresh.pc = 0.1, thresh.fc = 0.1, thresh.p = 1)
#> Use the joint cell labels from the merged CellChat object
# map the results of differential expression analysis onto the inferred cell-cell communications to easily manage/subset the ligand-receptor pairs of interest
net <- netMappingDEG(cellchat, features.name = features.name)
# extract the ligand-receptor pairs with upregulated ligands in LS
net.up <- subsetCommunication(cellchat, net = net, datasets = "LS",ligand.logFC = 0.2, receptor.logFC = NULL)
# extract the ligand-receptor pairs with upregulated ligands and upregulated recetptors in NL, i.e.,downregulated in LS
net.down <- subsetCommunication(cellchat, net = net, datasets = "NL",ligand.logFC = -0.1, receptor.logFC = -0.1)

Thanks

sqjin commented 11 months ago

@jv-20 The statistical test is used to find the interactions that are over-represented in certain cell groups. Thus we think it is sufficient. In addition, since we only permute the cell labels 100 times, it is not suitable for further multiple testing. Finally, based on recent benchmarking study (one paper published in Genome Biol 2023), our method ranks on the top.

For the differential expression analysis, we do already perform the multiple testing using '"bonferroni"' method. You can check the source codes if you want.

jv-20 commented 11 months ago

@sqjin Many thanks for the reply.

One last question. So for example, the ligand.pvalues obtained using this function and code below is the adjusted.p.value?

net <- netMappingDEG(cellchat, features.name = features.name)
# extract the ligand-receptor pairs with upregulated ligands in LS
net.up <- subsetCommunication(cellchat, net = net, datasets = "LS",ligand.logFC = 0.2, receptor.logFC = NULL)

Thanks

jv-20 commented 11 months ago

@sqjin , also would you be able to advise how to get logFC data frame for all signalling genes across all cell types comparing two datasets. Not just the significant ones.

If I use,

pos.dataset = "d1"
# define a char name used for storing the results of differential expression analysis
features.name = pos.dataset

# perform differential expression analysis
cellchat <- identifyOverExpressedGenes(cellchat.HC_CDI, group.dataset = "datasets", pos.dataset = pos.dataset, features.name = features.name, only.pos = FALSE, thresh.pc = 0, thresh.fc = 0, thresh.p = 1)

#> Use the joint cell labels from the merged CellChat object # map the results of differential expression analysis onto the inferred cell-cell communications to easily manage/subset the ligand-receptor pairs of interest
net <- netMappingDEG(cellchat, features.name = features.name,thresh = 1)

#trying to create a logFC dataframe for ligands
net.d1=net%>%filter(datasets=="d1")
net.d1_long=net.d1%>%select(source,ligand,ligand.pvalues,ligand.logFC)%>%arrange(source)%>%distinct(source,ligand,.keep_all=T)

net.d1_wide=reshape(net.d1_long,idvar="ligand",timevar="source",direction="wide")
net.d1_wide.logFC=net.d1_wide[,grepl("logFC",colnames(net.d1_wide))]
net.d1_wide.pvalues=net.d1_wide[,grepl("pvalues",colnames(net.d1_wide))]

I get information on genes for certain cell types. Some of the cell types show "NA" values. Is the NA values shown for certain cell types are because these cell types sufficiently doesn't express these genes to be included in the analyses so coded as NA?

Is it possible to get information on logFC and adjusted pvalues for all signalling genes across all celltypes?

The bonferroni adjusted p values obtained using

cellchat@var.features$d1.info

returns only handful of genes as significant. It seems to over correct. Would you have any suggestions?

Thanks

sqjin commented 11 months ago

@jv-20 In this case, you need to modify the source code of identifyOverExpressedGenes

jv-20 commented 11 months ago

Thanks @sqjin It is little confusing. Could you please help in clarifying which lines of codes to modify in identifyOverExpressedGenes

Thank you