rcastelo / GSVA

Gene set variation analysis
198 stars 40 forks source link

Association between pathway score of two pathways #176

Open Chrisdoan9 opened 3 months ago

Chrisdoan9 commented 3 months ago

Hi team,

After run gsva, I have pathway score. Then I try to find association between pathways of interested using cor.test(). However, when I pick two random pathways to find their association, it is very likely to find very small p value like < 1e-20. However, r value > 0.5 or <-0.5 is not common. Would you please have a comment about this? Is that very small p value because of the nature consequence of gsva algorithm or because of my dataset? Love this package because it is maintained.

This plot is great. I have around 2000 sample with 7000 pathways so how can make a nice plot like this? Just worry it will so dense. image There is not 49 pathways as said in this vignette.

Thank you so much!

rcastelo commented 3 months ago

Hi, without showing the actual code you use that reproduces the problem, I cannot say much, and my intuition is that this may have to do with the data that you are using as input. Regarding the visualization question, certainly you cannot create a heatmap with 7000 rows, one per pathway, and expect to be able to read the labels. Here first you should ask yourself whether it makes sense to work with 7000 pathways and what question do you want to answer with the visualization.

Chrisdoan9 commented 3 months ago

Thank you so much! I have pathway score of 7000 pathways and I put random two pathways: cor.test(A, B). Each subject has pathway score range -0.5 to 0.5. 25 quantile to 75 quantile mostly in range -0.1 to 0.1. I will check with other dataset to see if it is because my data.

I check with data from this tutorial https://alexslemonade.github.io/refinebio-examples/02-microarray/pathway-analysis_microarray_03_gsva.html

gsva_results <- as.matrix(gsva_results)
t_gsva <- t(gsva_results)
t_gsva <- as.data.frame(t_gsva)
cor_test_result <- cor.test(t_gsva$HALLMARK_TNFA_SIGNALING_VIA_NFKB, t_gsva$HALLMARK_HYPOXIA)

the p value is 3.106363e-75, so I think small p value is because the nature of the algorithm. So I am not sure if I can say two paths has association when p value < 0.05 and r value > 0.5