robinandeer / puzzle

Variant caller GUI + genetic disease analysis
https://robinandeer.gitbooks.io/puzzle/content/
MIT License
22 stars 7 forks source link

Missing variant info from freebayes VCF #125

Closed vasiliosz closed 8 years ago

vasiliosz commented 8 years ago

Using the VCF plugin. Variants are called by freebayes and then annotated with VEP. Multiple fields seem to be missing here that are present in the VCF, but in alternative forms. AD field is not in the original file, but I think the DP (total coverage per sample) is still interesting to print out in that case.

Better, we can extend support for freebayes. Some info from the VCF header:

##INFO=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count, with partial observations recorded fractionally">
##INFO=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observations, with partial observations recorded fractionally">
##INFO=<ID=PRO,Number=1,Type=Float,Description="Reference allele observation count, with partial observations recorded fractionally">
##INFO=<ID=PAO,Number=A,Type=Float,Description="Alternate allele observations, with partial observations recorded fractionally">
inklistrad_bild_09_02_16_22_13

Full line in VCF is as below:

[VZMBP uppmount]$ tabix my.vcf.gz 1:865628-865628 | less -S
1   865628  .   G   A   334.162 .   AB=0.454545;ABP=3.60252;AC=1;AF=0.166667;AN=6;AO=15;CIGAR=1X;DP=98;DPB=98;DPRA=1.01538;EPP=3.15506;EPPR=3.03646;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=3;NUMALT=1;ODDS=31.4439;PAIRED=0.933333;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=550;QR=3161;RO=83;RPL=10;RPP=6.62942;RPPR=4.29225;RPR=5;RUN=1;SAF=5;SAP=6.62942;SAR=10;SRF=38;SRP=4.29225;SRR=45;TYPE=snp;CSQ=upstream_gene_variant|||ENSG00000187634|SAMD11|ENST00000341065||||-/589|protein_coding,missense_variant|Ggt/Agt|G/S|ENSG00000187634|SAMD11|ENST00000342066|3/14|benign(0.099)|deleterious_low_confidence(0.01)|56/681|protein_coding,synonymous_variant|acC/acT|T|ENSG00000268179|AL645608.1|ENST00000598827|4/6|||38/112|protein_coding,missense_variant|Ggt/Agt|G/S|ENSG00000187634|SAMD11|ENST00000437963|3/5|benign(0.099)|deleterious_low_confidence(0.02)|56/109|protein_coding,missense_variant|Ggt/Agt|G/S|ENSG00000187634|SAMD11|ENST00000420190|3/7|benign(0.099)|deleterious_low_confidence(0.01)|56/179|protein_coding   GT:DP:RO:QR:AO:QA:GL    0/0:30:30:1145:0:0:0,-9.0309,-103.305   0/1:33:18:666:15:550:-39.8804,0,-50.3079    0/0:35:35:1350:0:0:0,-10.536,-121.739
vasiliosz commented 8 years ago

Relevant lines are here, I think: variant_mixin.py#L384-L400

moonso commented 8 years ago

Sure, I will give it a go. Do you have a small example vcf? You can send it on slack or mail