sqjin / CellChat

R toolkit for inference, visualization and analysis of cell-cell communication from single-cell data
GNU General Public License v3.0
610 stars 140 forks source link

Seemingly incorrect differential analysis #342

Open GBeattie opened 2 years ago

GBeattie commented 2 years ago

Hey, great package!

I am performing the differential analysis as described in your vignette on my own data. I'm presuming something is missing in my understanding of the outputs, but they are difficult to interpret. Here is my output:

image

I would have expected the mCherry+ plot (top) to only include pathways where the Commun. Prob is higher in (+) than in (-). Both plots also include mostly the same pathways, which I would think shouldn't be the case since I've pulled out the top logFC changes for the top plot and the bottom logFC changed for the bottom plot.

Any assistance would be greatly appreciated, I've included the relevant part of my code below in case that's part of the issue.

Thanks!

Gordon

cc <- identifyOverExpressedGenes(cc, group.dataset = "datasets", pos.dataset = "+", only.pos = FALSE, thresh.pc = 0.1, thresh.fc = 0.1, thresh.p = 1)
net <- netMappingDEG(cc, features.name = "features")
net.up <- subsetCommunication(cc, net = net, datasets = "+", ligand.logFC = 0.5, receptor.logFC = 0.5)
net.down <- subsetCommunication(cc, net = net, datasets = "-", ligand.logFC = -0.5, receptor.logFC = -0.5)

cc.diff.bubble.list <- list(
  netVisual_bubble(cc, pairLR.use = net.up[, "interaction_name", drop = F], comparison = c(1, 2), sources.use = "Mac Int", targets.use = "AT1",  angle.x = 45, remove.isolate = T, title.name = "Up-regulated signaling in mCherry +"),
  netVisual_bubble(cc, pairLR.use = net.down[, "interaction_name", drop = F], comparison = c(1, 2), sources.use = "Mac Int", targets.use = "AT1", angle.x = 45, remove.isolate = T, title.name = "Up-regulated signaling in mCherry -"))

plot_grid(plotlist = cc.diff.bubble.list, ncol = 1)
tomthomas3000 commented 2 years ago

facing same issue as above!

sqjin commented 2 years ago

@tomthomas3000 Can you check the obtained net.up and net.down to see whether they look good? Let's make sure this is not an issue caused by the function netVisual_bubble

sqjin commented 2 years ago

@GBeattie I am wondering if you have addressed this issue?

GBeattie commented 2 years ago

@sqjin I'm afraid not, I'm currently relying on generating a fold change comparison between communication probabilities in each condition to find differences, which likely isn't ideal.

tomthomas3000 commented 2 years ago

@sqjin net.up and net.down appears to look good, at least when I sort by FC, the ligand and receptors do make sense. I think the problem might lie with the extractGeneSubsetFromPair function producing erroneous gene.up and gene.down lists potentially. Although cannot exclude problems with netVisual_bubble. I say this because - my gene.up list has 397 genes. and my gene.down list has 312 genes. Very confusingly - there are 238 genes that are shared across both lists - hence almost appearing to contradict each other, especially with my biological q!!

I also just want to double check my interpretation of the subsetCommunication function:

net.up <- subsetCommunication(cellchat, net = net, datasets = "condA",ligand.logFC = 0.2, receptor.logFC = NULL)

This would essentially isolate ligand FC > 0.2 in condition A hence in net.up

net.down <- subsetCommunication(cellchat, net = net, datasets = "condB",ligand.logFC = -0.1, receptor.logFC = -0.1)

This is slightly confusing. I have applied the same template as the vignette - does this extract the ligand-receptor pairs with upregulated ligands and upregulated receptors in condB, on the premise that those would be downregulated in condA hence why ligand and receptor logFC are negative? I suppose the confusion is because of the [datasets="condB"] bit. Hence it almost looks like we are extracting negative FC in ligands and receptors from condB. Can you clarify why simply doing - [net.down <- subsetCommunication(cellchat, net = net, datasets = "condA",ligand.logFC = -0.1, receptor.logFC = -0.1)] would be insufficient to identify downregulated ligands and receptors in condA?

@GBeattie can you check if you are facing a similar issue i.e. the extractGeneSubsetFromPair function is producing erroneous lists - in your case, this function applied to your dataset will suggest overlapping ligands/receptors between mCherry+ and mCherry-

tomthomas3000 commented 2 years ago

@sqjin I suppose the other factor here is that there could be a typo in the actual vignette i.e. "#extract the ligand-receptor pairs with upregulated ligands and upregulated recetptors in NL, i.e.,downregulated in LS" net.down <- subsetCommunication(cellchat, net = net, datasets = "NL",ligand.logFC = -0.1, receptor.logFC = -0.1) should instead be: net.down <- subsetCommunication(cellchat, net = net, datasets = "LS",ligand.logFC = -0.1, receptor.logFC = -0.1)?

GBeattie commented 2 years ago

@tomthomas3000 Just to confirm yes I do find the same interactions in net.up and net.down. I've tried playing with subsetCommunication and can't quite see any pattern as to why it's returning the same interactions in each list. At the moment I'm just going the route of using netVisual_bubble with the comparison option set and looking for differences, it also has an option to return the data frame so I could filter to show only differences if I choose.

tomthomas3000 commented 2 years ago

@GBeattie just to clarify the net.up and net.down shouldnt be having overlapping interactions (I think this part is running ok - if you sort by ligand FC etc., it should look ok). But once you run the extractGeneSubsetFromPair, then your list of genes might overlap. Is that the case?

GBeattie commented 2 years ago

@tomthomas3000 Both the interactions and the genes themselves overlap unsurprisingly, see below for my working. I wouldn't be as concerned over a little overlap between the genes, since individual genes could be involved in different interactions that could be different between the samples.

intersect(net.up$interaction_name_2, net.down$interaction_name_2)
[1] "Ccl5  - Ccr1"   "Ccl6  - Ccr2"   "Thbs1  - Cd47"  "Cadm1  - Cadm1"
intersect(extractGeneSubsetFromPair(net.up, cc), extractGeneSubsetFromPair(net.down, cc))
 [1] "Ccl5"  "Ccl6"  "Spp1"  "C3"    "Thbs1" "Cadm1" "Icam1" "Flt1"  "Ccr1"  "Ccr2"  "Sdc4"  "Axl"   "Cd47"  "Itgav"
[15] "Cd44"  "Itgb1"
tomthomas3000 commented 2 years ago

@GBeattie the results i get are more in keeping with the image posted in the issue - where there is significant overlap. [my gene.up list has 397 genes. and my gene.down list has 312 genes. Very confusingly - there are 238 genes that are shared across both lists] - these are two opposing conditions, hence why it would be quite confusing why those pathways would be up in both conditions! I wonder if there is anything wrong with simply doing [net.down <- subsetCommunication(cellchat, net = net, datasets = "condA",ligand.logFC = -0.1, receptor.logFC = -0.1)]

sqjin commented 2 years ago

@tomthomas3000 The tutorial on this part is correct. When you merged your two cellchat objects, is the order as follows: "condA" and "condB"? If so, when identifying up-regulated interactions in condB compared to condA, you should run net.up <- subsetCommunication(cellchat, net = net, datasets = "condB",ligand.logFC = 0.2, receptor.logFC = 0.2). When identifying down-regulated interactions in condB compared to condA, you should run net.down <- subsetCommunication(cellchat, net = net, datasets = "condA",ligand.logFC = -0.1, receptor.logFC = -0.1).

When identifying down-regulated interactions in condB compared to condA, ligand.logFC = -0.1 and receptor.logFC = -0.1 will use the information from differential expression analysis to only retain the ligands and receptors that are down-regulated in condB compared to condA. datasets = "condA" indicates that we will retain the interactions in condA that are down-regulated in condB compared to condA. Assume that CellChat predicts WNT signaling in condA, but not in condB. Then we expect to identify WNT as down-regulated in condB compared to condA. Thus, we should set datasets = "condA".

Please let me know if you still feel confusing.

erikagucciardo commented 1 year ago

Following this thread. So, is the order in which we merge the cell chat object determining how we use these net.up and net.down formulas? What is the use of setting the pos.dataset then?

sqjin commented 1 year ago

@erikagucciardo The order will not affect the results because we use the pos.dataset = "LS" instead of the index.

Jaimomar99 commented 1 year ago

Hi there, Let me follow the thread and try to clarify the issue since I've been also struggling with this difference. So I have two datasets, let's say healthy and disease. I set pos.dataset as "disease". pos.dataset = disease Then, according to the vignette and this thread if I want to check up-regulated interaction in disease compared with healthy, it would be: net.up <- subsetCommunication(cellchat_object, net = net, datasets = "disease", ligand.logFC = 0.1, receptor.logFC = 0.1) On the other hand, if I want to see down-regulated interactions in the disease compared to healthy, it would be: net.down <- subsetCommunication(cellchat_object, net = net, datasets = "healthy" ,ligand.logFC = -0.1, receptor.logFC = -0.1).

So far, so good. But then maybe the discussion is when I turn them around: subsetCommunication(cellchat_object, net = net, datasets = "healthy", ligand.logFC = 0.1, receptor.logFC = 0.1) and subsetCommunication(cellchat_object, net = net, datasets = "disease", ligand.logFC = -0.1, receptor.logFC = -0.1).

In the last two cases, (please correct me if I am wrong), we are checking the up-regulated interactions of healthy compared with disease, and down-regulated interactions of healthy compared with disease. These two last cases is the same as if I put pos.dataset = healthy and I follow the vignette. So in the end, the code depends on the biological question that you are asking.

My most difficult part to understand is that the fact that up-regulated interactions in healthy compare to disease is not the same as down-regulated interactions in disease compared to healthy. Please @sqjin correct me if I am wrong, but I think this is the question that you have to consider in the analysis @tomthomas3000, @GBeattie

sqjin commented 1 year ago

@Jaimomar99 You are right that you will get different results in the last two cases.

When running subsetCommunication(cellchat_object, net = net, datasets = "healthy", ligand.logFC = 0.1, receptor.logFC = 0.1), you will subset the interactions from the healthy condition.

When running subsetCommunication(cellchat_object, net = net, datasets = "disease", ligand.logFC = -0.1, receptor.logFC = -0.1), you will subset the interactions from the disease condition.

Tjcyz commented 8 months ago

@tomthomas3000 The tutorial on this part is correct. When you merged your two cellchat objects, is the order as follows: "condA" and "condB"? If so, when identifying up-regulated interactions in condB compared to condA, you should run net.up <- subsetCommunication(cellchat, net = net, datasets = "condB",ligand.logFC = 0.2, receptor.logFC = 0.2). When identifying down-regulated interactions in condB compared to condA, you should run net.down <- subsetCommunication(cellchat, net = net, datasets = "condA",ligand.logFC = -0.1, receptor.logFC = -0.1).

When identifying down-regulated interactions in condB compared to condA, ligand.logFC = -0.1 and receptor.logFC = -0.1 will use the information from differential expression analysis to only retain the ligands and receptors that are down-regulated in condB compared to condA. datasets = "condA" indicates that we will retain the interactions in condA that are down-regulated in condB compared to condA. Assume that CellChat predicts WNT signaling in condA, but not in condB. Then we expect to identify WNT as down-regulated in condB compared to condA. Thus, we should set datasets = "condA".

Please let me know if you still feel confusing.

but When identifying down-regulated interactions in condB compared to condA, the code: net.down <- subsetCommunication(cellchat, net = net, datasets = "condA",ligand.logFC = -0.1, receptor.logFC = -0.1) implies extracting pathways from condition A where receptors and ligands are downregulated, as the chosen criteria here areligand.logFC = -0.1, receptor.logFC = -0.1. In other words, downregulation occurs in condition A implies upregulation in condition B, so should the criteria be set "ligand.logFC = 0.1, receptor.logFC = 0.1 " now to filter out pathways that are downregulated in condition B?

Tjcyz commented 8 months ago

@tomthomas3000 The tutorial on this part is correct. When you merged your two cellchat objects, is the order as follows: "condA" and "condB"? If so, when identifying up-regulated interactions in condB compared to condA, you should run net.up <- subsetCommunication(cellchat, net = net, datasets = "condB",ligand.logFC = 0.2, receptor.logFC = 0.2). When identifying down-regulated interactions in condB compared to condA, you should run net.down <- subsetCommunication(cellchat, net = net, datasets = "condA",ligand.logFC = -0.1, receptor.logFC = -0.1).

When identifying down-regulated interactions in condB compared to condA, ligand.logFC = -0.1 and receptor.logFC = -0.1 will use the information from differential expression analysis to only retain the ligands and receptors that are down-regulated in condB compared to condA. datasets = "condA" indicates that we will retain the interactions in condA that are down-regulated in condB compared to condA. Assume that CellChat predicts WNT signaling in condA, but not in condB. Then we expect to identify WNT as down-regulated in condB compared to condA. Thus, we should set datasets = "condA".

Please let me know if you still feel confusing.

"For your tutorial, I have another question. In this bubble plot regarding upregulated and downregulated pathways, the pathway CXCL12-CXCR4 appears as both upregulated and downregulated in Inflam.FIB > cDC2. How can this be explained?"

Tjcyz commented 8 months ago

@Jaimomar99 You are right that you will get different results in the last two cases.

When running subsetCommunication(cellchat_object, net = net, datasets = "healthy", ligand.logFC = 0.1, receptor.logFC = 0.1), you will subset the interactions from the healthy condition.

When running subsetCommunication(cellchat_object, net = net, datasets = "disease", ligand.logFC = -0.1, receptor.logFC = -0.1), you will subset the interactions from the disease condition.

When running subsetCommunication(cellchat_object, net = net, datasets = "disease", ligand.logFC = -0.1, receptor.logFC = -0.1), it will subset the interactions from the disease condition, I think it is right but I confuse that it will subset the downregulated pathway from the disease condition, not the healthy condition, hence if we want the downregulations of healthy condition , it may be filter by: ligand.logFC = 0.1, receptor.logFC = 0.1. Please correct me if I am wrong.

Tjcyz commented 8 months ago

@sqjin net.up and net.down appears to look good, at least when I sort by FC, the ligand and receptors do make sense. I think the problem might lie with the extractGeneSubsetFromPair function producing erroneous gene.up and gene.down lists potentially. Although cannot exclude problems with netVisual_bubble. I say this because - my gene.up list has 397 genes. and my gene.down list has 312 genes. Very confusingly - there are 238 genes that are shared across both lists - hence almost appearing to contradict each other, especially with my biological q!!

I also just want to double check my interpretation of the subsetCommunication function:

net.up <- subsetCommunication(cellchat, net = net, datasets = "condA",ligand.logFC = 0.2, receptor.logFC = NULL) This would essentially isolate ligand FC > 0.2 in condition A hence in net.up

net.down <- subsetCommunication(cellchat, net = net, datasets = "condB",ligand.logFC = -0.1, receptor.logFC = -0.1) This is slightly confusing. I have applied the same template as the vignette - does this extract the ligand-receptor pairs with upregulated ligands and upregulated receptors in condB, on the premise that those would be downregulated in condA hence why ligand and receptor logFC are negative? I suppose the confusion is because of the [datasets="condB"] bit. Hence it almost looks like we are extracting negative FC in ligands and receptors from condB. Can you clarify why simply doing - [net.down <- subsetCommunication(cellchat, net = net, datasets = "condA",ligand.logFC = -0.1, receptor.logFC = -0.1)] would be insufficient to identify downregulated ligands and receptors in condA?

@GBeattie can you check if you are facing a similar issue i.e. the extractGeneSubsetFromPair function is producing erroneous lists - in your case, this function applied to your dataset will suggest overlapping ligands/receptors between mCherry+ and mCherry-

I have the same confusion, I agree with you. I think the code net.down <- subsetCommunication(cellchat, net = net, datasets = "condB",ligand.logFC = -0.1, receptor.logFC = -0.1) looks like is the downregulations of condB not condA. Did you find the answer?