vgteam / toil-vg

Distributed and cloud computing framework for vg
Apache License 2.0
21 stars 14 forks source link

Normalize variant calls #198

Open cmarkello opened 7 years ago

cmarkello commented 7 years ago

So I noticed that in the main vg git repo README that they recommend normalizing the variant calls for comparison purposes. https://github.com/vgteam/vg#variant-calling

Should this be incorporated into the vg_call module as an option or is normalization for a different use case?

glennhickey commented 7 years ago

This option sounds like a good idea to me. That README needs updating but as it stands, the bakeoff script still does some normalization

https://github.com/BD2KGenomics/hgvm-graph-bakeoff-evaluations/blob/master/scripts/computeVariantsDistances.py#L1081-L1093

but only on single allele variants it seems (probably due to a vt limitation).

The issue the above was trying to address was fairly subtle: By making large block calls, the cactus (and some other) graphs were writing the equivalent genotypes to, say, snp1kg in fewer vcf lines. I was worried this effect would skew precision comparisons (by lumping multiple snps into one false positive) between different graphs, especially on smaller regions. But it's also a handy tool to have when eyeballing calls on the funkier graphs.

On Fri, May 12, 2017 at 3:14 PM, Charles Markello notifications@github.com wrote:

So I noticed that in the main vg git repo README that they recommend normalizing the variant calls for comparison purposes. https://github.com/vgteam/vg#variant-calling

Should this be incorporated into the vg_call module as an option or is normalization for a different use case?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BD2KGenomics/toil-vg/issues/198, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7lhVpWlVCYLlmLOm0CyLOMBScCexks5r5K-rgaJpZM4NZrVr .