griffithlab / civicpy

A python interface for the CIViC db application
MIT License
9 stars 5 forks source link

Update CSQ field names and order to better reflect official annotation guidelines #39

Closed susannasiebert closed 5 years ago

susannasiebert commented 5 years ago

After going over https://github.com/googlegenomics/gcp-variant-transforms/blob/master/docs/variant_annotation.md and http://snpeff.sourceforge.net/VCFannotationformat_v1.0.pdf this PR reorders and renames some CSQ fields. The two documents seem to contradict in some places (e.g., using Gene Name vs SYMBOL). In those instances I decided to go with Variant Annotation/VEP field name since that is what bigQuery seems to recommend.

HGVS.c and HGVS.p are another set of predefined fields we could consider adding to our annotations. We could extract them from variant aliases. This would be pretty straightforward, I believe, but we would need to decide how to handle multiple variant aliases that match each type (e.g. for build 37 vs build 38) since we can only pick one.

ahwagner commented 5 years ago

For HGVS.p and HGVS.c, we can probably use variant.hgvs_expressions, instead of variant.aliases; minor help, but just pointing out that we have that much already.

We should definitely stick to hg37 if we need to pick only one.