replikation / What_the_Phage

WtP: Phage identification via nextflow and docker or singularity
https://mult1fractal.github.io/wtp-documentation/
GNU General Public License v3.0
103 stars 15 forks source link

P-value and score #184

Open FabiKeiki opened 1 year ago

FabiKeiki commented 1 year ago

I'm intrigued by the choice made to mix p.value and scores for the phage prediction by contigs. Scores are related to a measure of some kind and p.values are related to the statitistical significance of the score. Also the p.values expressed in the final report are not p.values but 1-p.value. Now we can have a high score with a high p.value (low statistical significance) and a low score with a low p.value (high statistical significance). According to the benchmarking of the various tools (HO et al.), I can trust a score of 0.8 (very small amount of small positive) , I wouldn't however trust a p.value of 0.2 (0.8 in the final report). So is it not dangerous to mix these informations as they are not related to the same thing? I think virfinder and deepvirfinder also give you a score, maybe put the score instead and eliminate all non statistically significant values. I think that the way it is now, the normed sum of phage tools is not really usable.

mult1fractal commented 1 year ago

Hey My initial thought was to collect the information (scores and pvalues) in an overview table. As the articles stated, scores and p values tell the user how likely the contig is a phage. Therefore I put these values in the overview table and explained in tab. 2 what these values are. sum_normed is a value I used to sort the table for the highest likley phages (this value has no meaning). I agree that I need to rename it or also explain it in tab. 2. I changed in tab 2 the virfinder and deepvirfinder to score where it was named wrongly. The actual numbers in the Overview table are the scores and not the p values from both tools. I double checked this also in the code. Thank you for the Hint!