P-value and score - Githubissues

replikation / What_the_Phage

WtP: Phage identification via nextflow and docker or singularity

GNU General Public License v3.0

103 stars 15 forks source link

I'm intrigued by the choice made to mix p.value and scores for the phage prediction by contigs. Scores are related to a measure of some kind and p.values are related to the statitistical significance of the score. Also the p.values expressed in the final report are not p.values but 1-p.value. Now we can have a high score with a high p.value (low statistical significance) and a low score with a low p.value (high statistical significance). According to the benchmarking of the various tools (HO et al.), I can trust a score of 0.8 (very small amount of small positive) , I wouldn't however trust a p.value of 0.2 (0.8 in the final report). So is it not dangerous to mix these informations as they are not related to the same thing? I think virfinder and deepvirfinder also give you a score, maybe put the score instead and eliminate all non statistically significant values. I think that the way it is now, the normed sum of phage tools is not really usable.

Hey My initial thought was to collect the information (scores and pvalues) in an overview table. As the articles stated, scores and p values tell the user how likely the contig is a phage. Therefore I put these values in the overview table and explained in tab. 2 what these values are. sum_normed is a value I used to sort the table for the highest likley phages (this value has no meaning). I agree that I need to rename it or also explain it in tab. 2. I changed in tab 2 the virfinder and deepvirfinder to score where it was named wrongly. The actual numbers in the Overview table are the scores and not the p values from both tools. I double checked this also in the code. Thank you for the Hint!

replikation / What_the_Phage

P-value and score #184