Defining evaluation metrics

mbhall88 commented 5 years ago

It is useful to have definitions for certain evaluation metrics and terms in the context of this project.

Inspiration for these metrics-of-interest comes from the paper "Best practices for evaluating single nucleotide variant calling methods for microbial genomics". Of particular interest for this discussion is Table 1 and Figure 3.

mbhall88 commented 5 years ago

True Positive (TP)

Correct variant allele or position call

This is defined as an entry in pandora's VCF which, when taken with flanking sequence, maps over a variant position in the reference panel and has the correct base call at the expected position.

Mismatches are allowed in the flanks.

False Positive (FP)

Incorrect variant allele or position call

This is defined as an entry in pandora's VCF which, when taken with flanking sequence, maps over a variant position in the reference panel and has the incorrect base call at the expected position.

Mismatches are allowed in the flanks.

False Negative (FN)

Incorrect reference genotype or no call

This is defined as an entry in the reference panel which does not have any variants from pandora's VCF that map across its variant position.
Note: a pandora variant call can map to an entry in the reference panel but may not map across the middle position (which is the variant position).

True Negative (TN)

Correct reference genotype or no call

This is basically any position we have left the same as the reference correctly. True negatives would only really be relevant if we decide to apply the variant calls from pandora onto the consensus sequence from pandora and do a base-by-base comparison of that to the reference sequence.

Accuracy

Ratio of correct calls to total calls and variants

Calculation: TP+TN/TP+FP+TN+FN

The discussion point here would be whether to include TN or not?

Specificity

Non-variants not called as variants relative to the total non-variants

Calculation: TN/TN+FP

If we decide to "ignore" TNs then this metric would not be relative.

Sensitivity / Recall

True variants called relative to all variants

Calculation: TP/TP+FN

A very important metric for us and should be quite straight forward to calculate.

Precision

True variants called relative to total calls

Calculation: TP/TP+FP

Another very important metric for us. However, we need to first decide on our definition of FP.

False positive rate

Non-variants called relative to the total non-variants

Calculation: FP/TN+FP

Not relevant if we decide to ignore TNs.

mbhall88 commented 5 years ago

I think there is some ambiguity around the false positive definition. This could conceivably be defined as the total number of mismatches from mapping all pandora variant calls on to the original reference (not just the reference panel). The catch here though I guess is that it becomes messy as to whether it is de novo's fault or if it is just pandora's genotype model that is causing any given mismatch.

iqbal-lab commented 5 years ago

small typo in definition of accuracy: This Calculation: TP+TN/TP+FP+TN+FN should be Calculation: TP/TP+FP+TN+FN i think

iqbal-lab commented 5 years ago

I think it's easy to separate the impact of de novo as follows (naking the assumption that we have ensured that all simulated snps are outside the PRG. Define candidates to mean the list of alleles that de novo generates.

For de novo, define recall = % of simulated mutations where the mutant allele was included in the candidates. precision = %of slices where we perform de novo, that include a simulated-mutation within

these measure whether de novo is doing it's job.

Then, over and above these, measure sens/spec etc for the VCF as you've mentioned above - these measure how well pandora performs when including de novo, and this is our bottom line, combining de novo and genotyping

iqbal-lab commented 5 years ago

would it be possible to include a plot which was y axis = recall: what % of sim-mutations are correctly present and genotyped in the VCF x axis = error rate: what % of calls in the VCF are wrong

mbhall88 commented 5 years ago

small typo in definition of accuracy: This Calculation: TP+TN/TP+FP+TN+FN should be Calculation: TP/TP+FP+TN+FN i think

Hmm, ok. I was just going by the definition in Table 1 of the paper I quoted in the first comment.

mbhall88 commented 5 years ago

I think it's easy to separate the impact of de novo as follows (naking the assumption that we have ensured that all simulated snps are outside the PRG. Define candidates to mean the list of alleles that de novo generates.

What do you mean by "ensured that all simulated SNPs are outside the PRG"?

For de novo, define recall = % of simulated mutations where the mutant allele was included in the candidates. precision = %of slices where we perform de novo, that include a simulated-mutation within

So that would involve mapping each probe in the reference panel to each candidate paths fasta file produced by de novo and ensuring the base within the probe that is the mutation maps without mismatch?

iqbal-lab commented 5 years ago

small typo in definition of accuracy: This Calculation: TP+TN/TP+FP+TN+FN should be Calculation: TP/TP+FP+TN+FN i think

Hmm, ok. I was just going by the definition in Table 1 of the paper I quoted in the first comment.

argh!|! my mistake!

iqbal-lab commented 5 years ago

in this

I think it's easy to separate the impact of de novo as follows (naking the assumption that we have ensured that all simulated snps are outside the PRG. Define candidates to mean the list of alleles that de novo generates.

What do you mean by "ensured that all simulated SNPs are outside the PRG"?

i just mean that if you simulate a path in the prg, and insert a new snp that happens to be in the prg already, then there is no de novo to do

iqbal-lab commented 5 years ago

for this

For de novo, define recall = % of simulated mutations where the mutant allele was included in the candidates. precision = %of slices where we perform de novo, that include a simulated-mutation within

So that would involve mapping each probe in the reference panel to each candidate paths fasta file produced by de novo and ensuring the base within the probe that is the mutation maps without mismatch?

i've got confused about what the ref panel is. i just meant check whether the thing you simulated was one of the candidates

mbhall88 / pandora_simulations