zktuong / ktplots

Some tools for plotting single-cell data

https://zktuong.github.io/ktplots/

MIT License

164 stars 32 forks source link

plot_cpdb question #116

Closed L-wang17 closed 1 month ago

L-wang17 commented 1 month ago

Hi, As the function stated, it will only keep significant results on the map. So why there is another parameter for highliting significant result? And it seems like the highlighted ones are the real significant ones that are also different from the cpdb results of significant_means result.

Would you mind clearify this please? I am quite confusing when see the plots. Thank you!

' @param keep_significant_only logical. Default is TRUE. Switch to FALSE if you want to plot all the results from cpdb.

' @param highlight_col colour for highlighting p <0.05

' @param highlight_size stroke size for highlight if p < 0.05. if NULL, scales to -log10(pval).

zktuong commented 1 month ago

hi @L-wang17,

yes. so the idea is that cellphonedb's significant result just means that it's a statistical probability that the interaction is likely to exist between two celltypes (based on the statistical analysis mode), and not a test of whether the interaction is more significant between cell-type pairs. So I think it still make sense to look at the same interactions in other cell-type pairs to further verify whether the significant result is or is not expressed at all in other celltype pairs.

In deg analysis mode, the interactions exist because they are significant marker genes in the relevant celltypes - but it also doesn't mean that other celltypes are not expressing the genes - maybe just not significant according to marker gene tests.

So i prefer having the flexibility to look at the data more flexibly to make a more informed decision, choosing signficant interactions e.g. that are only present in the specific cell-types of interest (or has a clear dominance).

hope that helps.

L-wang17 commented 1 month ago

hi @L-wang17,

yes. so the idea is that cellphonedb's significant result just means that it's a statistical probability that the interaction is likely to exist between two celltypes (based on the statistical analysis mode), and not a test of whether the interaction is more significant between cell-type pairs. So I think it still make sense to look at the same interactions in other cell-type pairs to further verify whether the significant result is or is not expressed at all in other celltype pairs.

In deg analysis mode, the interactions exist because they are significant marker genes in the relevant celltypes - but it also doesn't mean that other celltypes are not expressing the genes - maybe just not significant according to marker gene tests.

So i prefer having the flexibility to look at the data more flexibly to make a more informed decision, choosing signficant interactions e.g. that are only present in the specific cell-types of interest (or has a clear dominance).

hope that helps.

Hi @zktuong,

Thank you for your fast response! Now I understand why there are two messages of "significant" on the graph, and did you use the same pvalue twice? There is only one pval generated by cpdb based on my understnding with only statistical_analysis and I don't quite sure about how is the two conclusion drawed from the same p-value? Or is it the way you used the p-value is different each time you have the <0.05 in the code? Would mind explain the logic behind it please?

Additionally, is there a way to only include the double significant interations in the dot graph and chord graph? Or is there a way to use the significant_mean.txt in the graph? (I am assuming here the significant_mean.txt contains the exactly same thing as the double significant interactions on the graph?) Thank you!

zktuong commented 1 month ago

i'm not sure what you mean by two significant messages - there's only 1 pvalue and that's used in the "significant" key here:

I'm not sure about significant_mean.txt. you can try passing that as means and see if it works? the dimensions should be the same

L-wang17 commented 1 month ago

i'm not sure what you mean by two significant messages - there's only 1 pvalue and that's used in the "significant" key here:

I'm not sure about significant_mean.txt. you can try passing that as means and see if it works? the dimensions should be the same

Yes besides this one, the other one it what you stated as this: In the instrustions: "Or don’t specify either and it will try to plot all significant interactions." and in the function:

' @param keep_significant_only logical. Default is TRUE. Switch to FALSE if you want to plot all the results from cpdb.

屏幕截图 2024-10-04 210606 屏幕截图 2024-10-04 210707

I am considering this function only kept interactions that is pval<0.05 on the graph and all others are null so you wrote that "try to plot all significant interactions" which means every bubble on the plot is a real/significant interaction. And with that, some of the bubbles are highlight in red which means in those real interactions, are also significant in this celltype pair compare with other celltype pairs. This is my understanding based on your first reply and other documents. That's why I would like to ask what is the logic behind it, and how you used the one p-value twice to generate two layers of significancy? I hope I was clear about my question.

And I tried passing significant_mean.txt as mean and it fails. I would assume because only significant value left and there are lots of empty values in significant_mean.txt.

zktuong commented 1 month ago

~oops sorry it's a typo. it just mean plot all interactions.~

Actually now that i think about it, the sentence you are refering to is if a user specifies genes or gene_family. so if a user don't specify that, it plots all signficant interactions (if keep_significant_only = TRUE). it does the filtering softly - i.e. as long as it's significant in any pair of interactions, it plots the whole row. currently there's no way to only plot just the significant means like you want. i can take a look when i have some time next week.

L-wang17 commented 1 month ago

~oops sorry it's a typo. it just mean plot all interactions.~

Actually now that i think about it, the sentence you are refering to is if a user specifies genes or gene_family. so if a user don't specify that, it plots all signficant interactions (if keep_significant_only = TRUE). it does the filtering softly - i.e. as long as it's significant in any pair of interactions, it plots the whole row. currently there's no way to only plot just the significant means like you want. i can take a look when i have some time next week.

Thanks a lot! That make lot of sense. But what do you mean by filering softly? I checked the pvalue of one of the interaction pn my plot, it is equal to one, and as the last line show that is suppose to NULL when it is not lower than 0.05? (I might be totally wrong). So when I interpret the dot plot, is it actually only highlight the real/significant interactions, and there is no comparison of one interaction within all the celltype pair in graph?

Sorry if this is getting too much detail because it is important for understanding the biological meaning of the data so I want to make sure I did not take the messages wrong.

zktuong commented 1 month ago

i just meant that if a particular interaction in a particular celltype pair is not significant, but is significant in another celltype pair, the result will be kept. the p-value cut off is < 0.05 like you said. only those that are < 0.05 will be highlighted with the red line.

So when I interpret the dot plot, is it actually only highlight the real/significant interactions, and there is no comparison of one interaction within all the celltype pair in graph?

yes ... the only comparison here (if any), is the intensity of the interaction score that the user decides what to do with.

for instance,

here, even though cellphonedb deems that TNFRSF13B interaction is only significant in 1 celltype pair, it's actually still quite highly expressed in the other two pairs, so i would not discount those two other groups as well.

In this situation, my thought process is that i would use the deg analysis mode as well to see if those genes are significant DEG or not.

L-wang17 commented 1 month ago

i just meant that if a particular interaction in a particular celltype pair is not significant, but is significant in another celltype pair, the result will be kept. the p-value cut off is < 0.05 like you said. only those that are < 0.05 will be highlighted with the red line.

So when I interpret the dot plot, is it actually only highlight the real/significant interactions, and there is no comparison of one interaction within all the celltype pair in graph?

yes ... the only comparison here (if any), is the intensity of the interaction score that the user decides what to do with.

for instance,

here, even though cellphonedb deems that TNFRSF13B interaction is only significant in 1 celltype pair, it's actually still quite highly expressed in the other two pairs, so i would not discount those two other groups as well.

In this situation, my thought process is that i would use the deg analysis mode as well to see if those genes are significant DEG or not.

Thank you! It is very clear to me now. Thank you very much. I really appreciate your patience with me.