Evaluating MapCaller - Githubissues

holtjma commented 4 years ago

Hello,

As mentioned in Issue #19, I'm evaluating MapCaller using RTG vcfeval along with the GIAB truth set. To get it to run, I wrote a quick script to reformat the samples moving the GT, AD, and DP fields from INFO into a FORMAT and sample field. I also made the assumption that the reference allele depth would equal "DP - AD" (please correct if that's a bad assumption given the current output format).

I ran it on three replicates of NA12878 using b38 (with the HLA and alt contigs, ~3300 contigs total) and the results were less than expected with regards to sensitivity (for comparison, most pipelines I've tested are over 99% for sensitivity with higher variability in precision):

Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
     None            2865332        2865535      65797     677155     0.9776       0.8088     0.8852
Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
     None            2873589        2873783      62375     668898     0.9788       0.8112     0.8871
Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
     None            2932879        2933133      34907     609608     0.9882       0.8279     0.9010

One of the main issues I saw was a lack of a GT call for several of the fields. For example, this was a line in one of the outputs where I would have expected a GT of "1/1", but the information was completely absent:

chr1    826577  .   A   AT  30  PASS    DP=35;AD=35;AF=1.000;TYPE=INSERT

I don't know whether this is a bug, user error on my part, or working as intended (i.e. sometimes GT field is simply missing). Regardless, it would be good to be able to force a GT call for the purpose of evaluation.

hsinnan75 commented 4 years ago

Thank you for testing and evaluating MapCaller. I think this should be a bug. I did not output GT field for indels. I'll fix this bug.

hsinnan75 commented 4 years ago

I've modified the VCF output format and updated MapCaller (v0.9.9.8). Please test and evaluate MapCaller again and let me know if there is any suggestions. Thank you very much.

hsinnan75 commented 4 years ago

BTW, could you provide me your test sets so that I can analyze the results? Thank you!

holtjma commented 4 years ago

I'll put it on my to-do list to run with the newer version and get back to you within the next couple days hopefully.

The sequencing runs I'm using were sequenced internally, but they're all replicates of NA12878 and I used the Genome in a Bottle truth set for evaluation. I might be able to share the exact runs if you're interested, but it's not something I'm authorized to do myself.

holtjma commented 4 years ago

New results look better, looks like adding the indel GT fields increasing precision/recall by about 10% each. Most of the remaining ones I'm simply not finding entries for in MapCaller's output (i.e. true false negatives for v0.9.9.8):

Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
     None            3284554        3383851     218011     257933     0.9395       0.9272     0.9333
Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
     None            3294325        3394001     214305     248162     0.9406       0.9299     0.9352
Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
     None            3367435        3472763     204531     175052     0.9444       0.9506     0.9475

hsinnan75 commented 4 years ago

Thank you for letting me know the evaluation result. There is one more thing I'd like to say about the evaluation. According to my experience, some of the indel events are ambiguous. There are more than one alignments including such indels having the same alignment scores. Therefore, the predicted positions of those indel events maybe a few bases away from the annotated ones. I was wondering if your evaluation metric considers such ambiguity. Anyway, I appreciate the evaluation you did for me. Thank you!!!

holtjma commented 4 years ago

Yep, RTG vcfeval (https://github.com/RealTimeGenomics/rtg-tools) is actually really good at handling indels (that's the main reason I use it for evaluation) and is one of the standards used for this type of thing. The other one I see a lot is hap.py (https://github.com/Illumina/hap.py) which wraps RTG but has another evaluation method.

hsinnan75 / MapCaller

Evaluating MapCaller #20