netZoo / netZooR

netZooR is a network biology package implemented in R.
https://netzoo.github.io/
GNU General Public License v3.0
105 stars 39 forks source link

Differences in pvalues in MONSTER #335

Open violafanfani opened 1 week ago

violafanfani commented 1 week ago

I am trying to measure the pvalues for each TF and I started using the "monsterCalculateTmPValues". However, after getting some confusing results I checked the code in the function and compared it to the one inside "monsterdTFIPlot" and they differ at least here: ssodm <- apply(res@tm,2,function(x){t(x)%*%x}) ssodm <- apply(res@tm,1,function(x){t(x)%*%x}) https://github.com/netZoo/netZooR/blob/9dd9438910afab9beee91799ab743448c3e14a84/R/MONSTER.R#L756

Which one is correct? Can you check if this is a mistake or a desired function?

marouenbg commented 1 week ago

Hey @violafanfani cc @taraeicher ,

They're the same because transition matrices are symmetrical.

Marouen

marouenbg commented 1 week ago

I take that back monsterdTFIPlot is the correct one!

marouenbg commented 1 week ago

Good catch Viola! Can you please PR these 2 lines? https://github.com/netZoo/netZooR/blob/9dd9438910afab9beee91799ab743448c3e14a84/R/MONSTER.R#L751 https://github.com/netZoo/netZooR/blob/9dd9438910afab9beee91799ab743448c3e14a84/R/MONSTER.R#L756

marouenbg commented 1 week ago

But I am expecting the results to be only very slightly different because the statistic measures deviation from the identity matrix. So for a given matrix, comparing the columns to those of a null or the rows to the rows of the same null matrix are similar statistics. I am thinking they might be equal even but for consistency, let's change it.

violafanfani commented 1 week ago

I am not super familiar, so I can only tell you what I observed. The results are heavily different, which is something we can also expect by looking at figure 2 in the paper (https://pubmed.ncbi.nlm.nih.gov/29237467/#&gid=article-figures&pid=fig-2-uid-1). You can see that the TM is behaving different on the columns and the rows, TM are not symmetrical (at all).

marouenbg commented 1 week ago

@violafanfani Yes, I took back my comment about symmetry. Can you post some p-values for TFs with each function?

marouenbg commented 1 week ago

It is not just the values of rows vs columns, the functions also compares rows or columns to a null distribution of rows or columns. As you can see below, the null is also sampled across rows or columns, that's why I am not expecting a strong difference (aka a nonsignificant TF becomes significant) https://github.com/netZoo/netZooR/blob/9dd9438910afab9beee91799ab743448c3e14a84/R/MONSTER.R#L674 https://github.com/netZoo/netZooR/blob/9dd9438910afab9beee91799ab743448c3e14a84/R/MONSTER.R#L751

violafanfani commented 1 week ago

From my experiment on yeast cell cycle test data, it is very (very) different. I get as significant TFs that are in the middle of the dTFI plot

marouenbg commented 1 week ago

Can you post them heree?

violafanfani commented 1 week ago

I am uploading the dTFI figure with the TF significant at pval<0.05, old: is the code on netzoor to compute pvalues, whole the other file is the one where I use the pvals computed as in the dTFI figure. In the logs there I also print the first 10 pvals in the two cases,

Old (monsterCalculateTmPValues(res, method = 'z-score')): pval RCS1 2.339182e-01 RLR1 1.245712e-04 SIG1 1.776618e-05 YBL005W 4.750487e-02 YBL021C 1.000000e+00 YBL103C 9.997405e-01 YBR049C 8.400741e-01 YBR083W 3.208280e-03 YBR182C 4.997253e-01 YBR240C 4.766604e-03

Correct (monsterCalculateTmPValues(res, method = 'z-score') with axis =2): pval RCS1 0.7948747 RLR1 0.7697828 SIG1 0.7697748 YBL005W 0.4145877 YBL021C 0.2524840 YBL103C 0.5372631 YBR049C 0.6533847 YBR083W 0.4694112 YBR182C 0.2339759 YBR240C 0.6460631

old_monster_figure.pdf monster_figure.pdf old_my_log.txt my_log.txt

marouenbg commented 1 week ago

Wow this is really different! Thanks for finding this 👍 💯