igvteam / igv-reports

Python application to generate self-contained pages embedding IGV visualizations, with no dependency on original input files.
MIT License
350 stars 52 forks source link

varianttable.py: add feature ID and nucleotide modification code to ANN format field printed entries #27

Closed dlaehnemann closed 5 years ago

dlaehnemann commented 5 years ago

Another little pull-request to enhance the ANN format field parsing:

varianttable.py: In the parsed ANN format field, add feature ID (e.g. transcript ID) and the nucleotide modified code to the displayed text in the igv-reports table. Especially the feature ID is more useful, if you can copy-paste it for further in-depth searches, which is not possible from the tooltip.

dlaehnemann commented 5 years ago

@jrobinso Just a quick bump-up, so that this doesn't get overlooked.

jrobinso commented 5 years ago

@dlaehnemann Sorry it had been overlooked. Could you attach a small example file with ANN field(s), (or add it to the test/data directory)? Thanks.

jrobinso commented 5 years ago

Documentation is here, noting it for future reference as field order is hardcoded: http://snpeff.sourceforge.net/VCFannotationformat_v1.0.pdf

dlaehnemann commented 5 years ago

@jrobinso, sorry for the slow response -- meant to answer, now. But that is exactly the documentation I based the changes on, should've included those in the original PR message.

The field order is hard-coded, e.g. snpeff generates the following VCF header line to describe it:

##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO'">

For jannovar it's:

##INFO=<ID=ANN,Number=1,Type=String,Description="Functional annotations:'Allele|Annotation|Annotation_Impact|Gene_Name|Gene_ID|Feature_Type|Feature_ID|Transcript_BioType|Rank|HGVS.c|HGVS.p|cDNA.pos / cDNA.length|CDS.pos / CDS.length|AA.pos / AA.length|Distance|ERRORS / WARNINGS / INFO'">

(Should also be Number=., this should be consistent in future jannovar versions: https://github.com/charite/jannovar/pull/455)

So, in theory the parsing could also be done based on the header line, if the fixed order should ever change. But for now, the fixed indexing should be fine. I'll also add two minimal VCF files with the ANN annotations for snpeff and jannovar before this PR is ready to merge.

dlaehnemann commented 5 years ago

So, here come two minimal testable files for snpeff and jannovar, on which tests could be based. But I'm not sure where a test would go?

jrobinso commented 5 years ago

Thanks for the test files. I'll write some minimal test that at least parses them and checks for errors. I was initially concerned with all the hardcoded positions, but that's how it's documented so that's how we have to parse it.

dlaehnemann commented 5 years ago

Sounds good to me, I'll delete the branch to keep the repo tidy.