quinlan-lab / vcf2db

create a gemini-compatible database from a VCF
MIT License
56 stars 13 forks source link

vcf2db can't handle records that were skipped by vcfanno #59

Open lbeltrame opened 5 years ago

lbeltrame commented 5 years ago

This is a follow up to my early issue (#58) with some more data. Note that I'm not sure if the issue is in vcfanno or in vcf2db.

Sometimes, vcfanno would leave a record empty, for example (reference is hg38):

chr17   35095958        .       C       A       184.0   PASS    ADJAF=0;BIAS=2:2;CALLERS=vardict;DUPRATE=0;EPR=pass;HIAF=0.0158;HICNT=70;HICOV=4419;LSEQ=CTTCAGTGGGTTCCAGGCAG;MQ=60;MSI=2;MSILEN=1;NM=3.4;ODDRATIO=1.10572;PMEAN=45.4;PSTD=1;QSTD=1;QUAL=27.1;REFBIAS=2410:2527;RSEQ=CAACTATTCAAACCATTTTC;SAMPLE=21561T;SBF=0.63466;SHIFT3=1;SN=1.628;SPANPAIR=0;SPLITREAD=0;TYPE=SNV;VARBIAS=58:55;VD=113;ANN=A|downstream_gene_variant|MODIFIER|RAD51D|ENSG00000185379|transcript|ENST00000590016.5|protein_coding||c.*4995G>T|||||4714|,A|downstream_gene_variant|MODIFIER|RNU6-840P|ENSG00000253084|transcript|ENST00000517275.3|snRNA||n.*1985C>A|||||1985|,A|downstream_gene_variant|MODIFIER|Vault|ENSG00000252328|transcript|ENST00000516519.1|misc_RNA||n.*1602G>T|||||1602|,A|intron_variant|MODIFIER|RAD51L3-RFFL|ENSG00000267618|transcript|ENST00000593039.5|protein_coding|5/6|c.426+5243G>T||||||,A|intragenic_variant|MODIFIER|RAD51D|ENSG00000185379|gene_variant|ENSG00000185379|||n.35095958G>T||||||;DP=5058;AF=0.0223;AN=2;AC=1;CSQ=A|intron_variant|MODIFIER|AC004223.3|ENSG00000267618|Transcript|ENST00000593039|protein_coding||5/6|ENST00000593039.5:c.426+5243G>T|||||||||-1||SNV|Clone_based_ensembl_gene||YES|2|P1||ENSP00000466834||K7EN88|UPI00003733EF||Ensembl|C|C|||||||||||||||||||||||||||||||     GT:DP:VD:AD:AF:RD:ALD   ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. 0/1:5058:113:4937,113:0.0223:2410,2527:58,55    ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:.    ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:.

This makes all the extra annotations(present in other records but missing there) as u''. This in turn is not liked by SQLalchemy, which errors:

sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) A value is required for bind parameter 'amr_af', in parameter group 1

In the dictionary, in fact, we find:

u'gnomad_amr_af': u''

Now, is this expected behavior? Should I file a bug against vcfanno instead?

brentp commented 5 years ago

vcf2db is complex enough that I cant/wont help out on these with a small vcf to recreate. same for #58