wustl-oncology / analysis-wdls

Scalable genomic analysis pipelines, written in WDL
MIT License
5 stars 11 forks source link

Improve selection of variant annotation for the variants.tsv #124

Open malachig opened 1 year ago

malachig commented 1 year ago

Currently we sometimes get a sub-optimal variant annotation being selected in the TSV. While this doesn't impact the VCF or the pVACtools analysis ... it is handy to have a table of variants where the "top" variant annotation is selected (one row per variant).

malachig commented 1 year ago

Further investigation of this issue has revealed that we are likely experiencing a bug in VEP v105 that has since been fixed.

We have been noticing annoying discrepancies between the variant annotations that get prioritized in pVACview compared to those coming out of VEP --flag_pick

For example, a variant like this: chr9 21970920 CA C.

This is a frameshift variant on the MANE select transcript of the cancer gene: CDKN2A.

But the "picked" variant annotation is an intron variant for another nearby predicted gene (ENSG00000264545) that doesn't even have a name.

When you dig into this example, its clear that VEP is not even following its own stated rules for prioritizing the annotation that gets picked.

And if you use the latest version of VEP (even with the same v105 annotations we are using), the problem goes away.

To resolve this issue we should consider updating to Ensembl v110 (both software and annotations).