quinlan-lab / vcf2db

create a gemini-compatible database from a VCF
MIT License
55 stars 13 forks source link

float() argument must be a string or number, spliceAI overlapping intron problem? #56

Closed Phillip-a-richmond closed 5 years ago

Phillip-a-richmond commented 5 years ago

This is somewhat similar to previous issues so I hope it's a quick fix.

It's the pipeline update time of year, and I think this issue is being caused by SpliceAI. Essentially, SpliceAI is adding multiple scores for variants that overlap the introns of two different genes. It makes sense, but since the floats are being stored, anytime there is are overlapping genes VCF2DB crashes and complains that it was expecting a float. This is just my hypothesis as to what is happening based on the fact that when I remove a couple of these offending lines there is no issue with the rest of the arguments I'm using. I've attached a small VCF with at least one of these events, and pasted the errors and commands below.

As always thanks for your help and these amazing tools!

NA12878_Trio.merged.hc.norm.vcfanno.subset.vcf.gz

My command:

python $VCF2DB \ --expand gt_quals --expand gt_depths --expand gt_alt_depths --expand gt_ref_depths --expand gt_types \ --a-ok gnomad_exome_ac_global --a-ok gnomad_exome_ac_popmax --a-ok gnomad_exome_an_global --a-ok gnomad_exome_an_popmax --a-ok gnomad_exome_hom_controls --a-ok gnomad_exome_hom_global \ --a-ok gnomad_exome_hom_popmax --a-ok gnomad_exome_popmax --a-ok gnomad_genome_ac_global --a-ok gnomad_genome_ac_popmax --a-ok gnomad_genome_an_global --a-ok gnomad_genome_an_popmax \ --a-ok gnomad_genome_hom_controls --a-ok gnomad_genome_hom_global --a-ok gnomad_genome_hom_popmax --a-ok gnomad_genome_popmax \ --a-ok InHouseDB_AC --a-ok in_segdup --a-ok AF --a-ok AC --a-ok AN --a-ok MLEAC --a-ok MLEAF --a-ok cpg_island --a-ok common_pathogenic --a-ok cse-hiseq --a-ok DS --a-ok ConfidentRegion \ $ANNOVCF $PED_FILE $GEMINIDB

The error:

bad record: AC 2 AF 0.5 AN 4 ANN None BaseQRankSum None CADD 22.1 CADD_indel None CCR None ClippingRankSum 0.0 ConfidentRegion True DP 116 ExcessHet 3.01029992104 FATHMM-XF-NONCODING 0.672812 FS None GeneHancer None InHouseDB_AC None LOF (ZNF343|ENSG00000088876|6|0.33) MLEAC 1 MLEAF 0.5 MQ 60.0 MQRankSum 0.0 OLD_MULTIALLELIC None OLD_VARIANT None PrimateAI None QD None ReadPosRankSum None SOR None SpliceAI_AcceptorGain (0.41990000009536743, 0.3418000042438507) SpliceAI_AcceptorLoss (0.0, 0.6237000226974487) SpliceAI_DonorGain (0.0, 0.0) SpliceAI_DonorLoss (0.0, 0.0) aa_change aa_length None aaf 0.5 ac 2 af 0.5 af_1kg_afr 0.0 af_1kg_all 0.000998400035314 af_1kg_amr 0.0 af_1kg_eas 0.0 af_1kg_eur 0.00499999988824 af_1kg_sas 0.0 af_esp_aa 0.000453926011687 af_esp_all 0.00246040290222 af_esp_ea 0.00348837208003 alt A an 4 ann None baseqranksum None biotype protein_coding cadd 22.1 cadd_indel None call_rate 0.666666666667 ccr None chrom chr20 clinvar_dbInfo None clinvar_dbinfo None clinvar_disease_name None clinvar_pathogenic None clippingranksum 0.0 codon_change c.305-2A>T confidentregion True cosmic_ids None cpg_island False cse-hiseq None cse_hiseq False dgv CopyNumber dp 116 ds False eQTL_GTEX_WholeBloodv7 None effect_severity HIGH encode_consensus_gm12878 T encode_consensus_h1hesc R encode_consensus_helas3 T encode_consensus_hepg2 T encode_consensus_huvec T encode_consensus_k562 T end 2465304 ensembl_gene_id None eqtl_gtex_wholebloodv7 None excesshet 3.01029992104 exon 5/5 fathmm_xf_noncoding 0.672812 filter None fitcons 0.106100000441 fs None gene ZNF343 genehancer None gerp_elements None gnomad_exome_AF_controls 0.00230020005256 gnomad_exome_ac_global 564 gnomad_exome_ac_popmax 0.00406019994989 gnomad_exome_af_controls 0.00230020005256 gnomad_exome_af_global 0.00260069989599 gnomad_exome_af_popmax 399 gnomad_exome_an_global 216862 gnomad_exome_an_popmax 1 gnomad_exome_hom_controls 0 gnomad_exome_hom_global 1 gnomad_exome_hom_popmax 98272 gnomad_exome_popmax nfe gnomad_genome_AF_controls 0.00222019990906 gnomad_genome_ac_global 84 gnomad_genome_ac_popmax 0.00462119979784 gnomad_genome_af_controls 0.00222019990906 gnomad_genome_af_global 0.00268699997105 gnomad_genome_af_popmax 71 gnomad_genome_an_global 31262 gnomad_genome_an_popmax 0 gnomad_genome_hom_controls 0 gnomad_genome_hom_global 0 gnomad_genome_hom_popmax 15364 gnomad_genome_popmax nfe gt_alt_depths i ,???? 'u???J??yJ?? d\?V gt_depths i ,;????7 gt_phases ? gt_quals f ,?B???B gt_ref_depths i ,???? t_types i , gts S (T/A./.T/A gwas_pubmed_trait None hapmap1 1.1468000412 hapmap2 8.50739955902 impact splice_acceptor_variant impact_severity HIGH impact_so splice_acceptor_variant in_rlcr False in_segdup False inhousedb_ac None is_canonical False is_coding False is_exonic False is_lof True is_splicing True lof (ZNF343|ENSG00000088876|6|0.33) mleac 1 mleaf 0.5 mq 60.0 mqranksum 0.0 num_het 2 num_hom_alt 0 num_hom_ref 0 num_unknown 1 old_multiallelic None old_variant None polyphen_pred None polyphen_score None pp2hdiv None pp2hvar None primateai None qd None qual 985.799987793 readposranksum None ref T rmsk None rs_ids rs73085335 set variant-variant3 sift_pred None sift_score None so splice_acceptor_variant sor None spliceai_acceptorgain (0.41990000009536743, 0.3418000042438507) spliceai_acceptorloss (0.0, 0.6237000226974487) spliceai_donorgain (0.0, 0.0) spliceai_donorloss (0.0, 0.0) stam_mean 5.12739992142 stam_names Melano start 2465303 sub_type tv tfbs None top_consequence splice_acceptor_variant transcript ENST00000278772 type snp variant_id 6912 vcf_id None Traceback (most recent call last): File "/opt/tools/vcf2db/vcf2db.py", line 923, in impacts_extras=a.impacts_field, aok=a.a_ok) File "/opt/tools/vcf2db/vcf2db.py", line 233, in init self.load() File "/opt/tools/vcf2db/vcf2db.py", line 318, in load i = self._load(self.cache, create=True, start=1) File "/opt/tools/vcf2db/vcf2db.py", line 311, in _load self.insert(variants, expanded, keys, i, create=create) File "/opt/tools/vcf2db/vcf2db.py", line 373, in insert vilengths, variant_impacts) File "/opt/tools/vcf2db/vcf2db.py", line 401, in _insert self.__insert(v_objs, self.metadata.tables['variants'].insert()) File "/opt/tools/vcf2db/vcf2db.py", line 435, in __insert raise e sqlalchemy.exc.StatementError: (exceptions.TypeError) float() argument must be a string or a number [SQL: u'INSERT INTO variants (variant_id, chrom, start, "end", vcf_id, ref, alt, qual, filter, type, sub_type, call_rate, num_hom_ref, num_het, num_hom_alt, num_unknown, aaf, gene, ensembl_gene_id, transcript, is_exonic, is_coding, is_lof, is_splicing, is_canonical, exon, codon_change, aa_change, aa_length, biotype, impact, impact_so, impact_severity, polyphen_pred, polyphen_score, sift_pred, sift_score, ac, af, an, baseqranksum, cadd, cadd_indel, ccr, clippingranksum, confidentregion, dp, ds, excesshet, fathmm_xf_noncoding, fs, genehancer, inhousedb_ac, lof, mleac, mleaf, mq, mqranksum, old_multiallelic, old_variant, primateai, qd, readposranksum, sor, spliceai_acceptorgain, spliceai_acceptorloss, spliceai_donorgain, spliceai_donorloss, af_1kg_afr, af_1kg_all, af_1kg_amr, af_1kg_eas, af_1kg_eur, af_1kg_sas, af_esp_aa, af_esp_all, af_esp_ea, clinvar_dbinfo, clinvar_disease_name, clinvar_pathogenic, cosmic_ids, cpg_island, cse_hiseq, dgv, eqtl_gtex_wholebloodv7, encode_consensus_gm12878, encode_consensus_h1hesc, encode_consensus_helas3, encode_consensus_hepg2, encode_consensus_huvec, encode_consensus_k562, fitcons, gerp_elements, gnomad_exome_af_controls, gnomad_exome_ac_global, gnomad_exome_ac_popmax, gnomad_exome_af_global, gnomad_exome_af_popmax, gnomad_exome_an_global, gnomad_exome_an_popmax, gnomad_exome_hom_controls, gnomad_exome_hom_global, gnomad_exome_hom_popmax, gnomad_exome_popmax, gnomad_genome_af_controls, gnomad_genome_ac_global, gnomad_genome_ac_popmax, gnomad_genome_af_global, gnomad_genome_af_popmax, gnomad_genome_an_global, gnomad_genome_an_popmax, gnomad_genome_hom_controls, gnomad_genome_hom_global, gnomad_genome_hom_popmax, gnomad_genome_popmax, gwas_pubmed_trait, hapmap1, hapmap2, in_rlcr, in_segdup, pp2hdiv, pp2hvar, rmsk, rs_ids, "set", stam_mean, stam_names, tfbs, gts, gt_types, gt_phases, gt_depths, gt_ref_depths, gt_alt_depths, gt_quals, gt_alt_freqs) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: [{u'gnomad_genome_hom_controls': 0, u'gnomad_exome_af_global': 0.002600699895992875, u'gnomad_exome_AF_controls': 0.0023002000525593758, u'InHouseDB_AC': None, u'CCR': None, 'gt_phases': <read-only buffer for 0x7fae4e95e930, size -1, offset 0 at 0x7fae4e960270>, u'cse-hiseq': None, u'clinvar_dbInfo': None, 'variant_id': 6912, 'alt': u'A', u'gnomad_exome_hom_controls': 0, u'eQTL_GTEX_WholeBloodv7': None, 'num_unknown': 1, u'spliceai_donorloss': (0.0, 0.0), 'codon_change': u'c.305-2A>T', 'gt_types': <read-only buffer for 0x7fae4e95d180, size -1, offset 0 at 0x7fae4e956c70>, 'is_lof': True, 'ds': False, 'gts': 'S\x0b(T/A\x00./.\x00T/A', u'gnomad_exome_af_popmax': 399, u'gnomad_genome_hom_global': 0, u'af_1kg_sas': 0.0, u'gnomad_genome_ac_global': 84, 'is_exonic': False, u'primateai': None, 'exon': u'5/5', u'clinvar_disease_name': None, 'ensembl_gene_id': None, 'chrom': u'chr20', 'polyphen_score': None, u'dp': 116, u'readposranksum': None, u'spliceai_donorgain': (0.0, 0.0), 'is_canonical': False, u'rmsk': None, u'gnomad_genome_popmax': u'nfe', u'gnomad_genome_an_popmax': 0, u'encode_consensus_k562': u'T', 'num_het': 2, u'old_variant': 'None', 'sift_pred': None, 'gt_depths': <read-only buffer for 0x7fae4e95d1b8, size -1, offset 0 at 0x7fae4e960230>, u'GeneHancer': None, 'effect_severity': 'HIGH', u'stam_names': u'Melano', u'SpliceAI_DonorGain': (0.0, 0.0), u'gnomad_genome_af_controls': 0.0022201999090611935, u'set': u'variant-variant3', 'vcf_id': None, 'gt_quals': <read-only buffer for 0x7fae4e95d260, size -1, offset 0 at 0x7fae4e960330>, u'tfbs': None, u'ConfidentRegion': True, u'pp2hvar': None, u'spliceai_acceptorloss': (0.0, 0.6237000226974487), u'ExcessHet': 3.0102999210357666, u'gnomad_genome_af_global': 0.002686999971047044, 'gt_ref_depths': <read-only buffer for 0x7fae4e95d1f0, size -1, offset 0 at 0x7fae4e9602b0>, 'call_rate': 0.6666666666666666, u'af_1kg_all': 0.000998400035314262, u'clinvar_dbinfo': 'None', u'encode_consensus_gm12878': u'T', u'af_esp_all': 0.0024604029022157192, u'gnomad_genome_an_global': 31262, 'ref': u'T', 'gt_alt_freqs': <read-only buffer for 0x7fae4e960370, size -1, offset 0 at 0x7fae4e9603b0>, u'ClippingRankSum': 0.0, u'qd': None, u'stam_mean': 5.127399921417236, 'impact': u'splice_acceptor_variant', u'gnomad_genome_AF_controls': 0.0022201999090611935, u'mqranksum': 0.0, u'MLEAC': 1, u'af_esp_ea': 0.003488372080028057, u'old_multiallelic': None, u'lof': u'(ZNF343|ENSG00000088876|6|0.33)', 'sub_type': 'tv', u'encode_consensus_hepg2': u'T', u'excesshet': 3.0102999210357666, u'encode_consensus_helas3': u'T', u'ANN': None, u'af_1kg_eur': 0.004999999888241291, u'SpliceAI_AcceptorGain': (0.41990000009536743, 0.3418000042438507), 'filter': None, 'aa_length': None, u'gnomad_genome_af_popmax': 71, u'MQ': 60.0, u'gnomad_exome_hom_global': 1, u'hapmap2': 8.507399559020996, u'hapmap1': 1.1468000411987305, u'SpliceAI_AcceptorLoss': (0.0, 0.6237000226974487), u'FS': None, u'gerp_elements': None, 'top_consequence': u'splice_acceptor_variant', u'gnomad_exome_an_global': 216862, 'num_hom_ref': 0, u'gnomad_genome_ac_popmax': 0.004621199797838926, 'is_splicing': True, u'PrimateAI': None, u'cadd': u'22.1', u'cosmic_ids': None, u'rs_ids': u'rs73085335', u'fitcons': 0.10610000044107437, u'af_1kg_eas': 0.0, u'MLEAF': 0.5, u'gnomad_exome_hom_popmax': 98272, u'af': 0.5, 'polyphen_pred': None, u'cadd_indel': None, u'genehancer': None, 'start': 2465303, 'sift_score': None, u'OLD_MULTIALLELIC': None, 'type': 'snp', u'af_1kg_afr': 0.0, u'MQRankSum': 0.0, 'impact_severity': 'HIGH', u'pp2hdiv': None, u'gnomad_genome_hom_popmax': 15364, u'inhousedb_ac': None, 'qual': 985.7999877929688, u'spliceai_acceptorgain': (0.41990000009536743, 0.3418000042438507), u'fathmm_xf_noncoding': u'0.672812', u'confidentregion': True, u'baseqranksum': None, 'aaf': 0.5, u'encode_consensus_huvec': u'T', u'in_rlcr': False, 'gt_alt_depths': <read-only buffer for 0x7fae4e95d228, size -1, offset 0 at 0x7fae4e9602f0>, u'mq': 60.0, 'num_hom_alt': 0, u'clinvar_pathogenic': None, u'in_segdup': False, u'ac': 2, u'BaseQRankSum': None, u'SpliceAI_DonorLoss': (0.0, 0.0), u'mleac': 1, u'ann': None, u'gnomad_exome_popmax': u'nfe', u'an': 4, u'encode_consensus_h1hesc': u'R', u'CADD': u'22.1', 'cse_hiseq': False, u'mleaf': 0.5, u'sor': None, u'FATHMM-XF-NONCODING': u'0.672812', u'DP': 116, u'af_1kg_amr': 0.0, u'gnomad_exome_an_popmax': 1, 'end': 2465304, u'gwas_pubmed_trait': None, u'LOF': u'(ZNF343|ENSG00000088876|6|0.33)', u'gnomad_exome_ac_popmax': 0.004060199949890375, u'OLD_VARIANT': None, u'CADD_indel': None, u'SOR': None, u'clippingranksum': 0.0, u'eqtl_gtex_wholebloodv7': 'None', u'AC': 2, u'fs': None, 'is_coding': False, u'gnomad_exome_af_controls': 0.0023002000525593758, u'AF': 0.5, u'AN': 4, u'dgv': u'CopyNumber', 'biotype': u'protein_coding', 'transcript': u'ENST00000278772', u'ReadPosRankSum': None, 'gene': u'ZNF343', 'aa_change': u'', u'ccr': None, u'af_esp_aa': 0.00045392601168714464, 'so': u'splice_acceptor_variant', u'QD': None, 'impact_so': u'splice_acceptor_variant', u'gnomad_exome_ac_global': 564, u'cpg_island': False}]]

My VCF Header:

fileformat=VCFv4.1

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

FILTER==5 && FORMAT/DP[*] < 600">

FILTER=

FILTER=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

GATKCommandLine.CombineVariants=<ID=CombineVariants,Version=3.4-46-gbc02625,Date="Wed Mar 20 14:20:36 PDT 2019",Epoch=1553116836050,CommandLineOptions="analysis_type=CombineVariants input_file=[] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] disable_read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/mnt/causes-vnx1/GENOMES/hg19/FASTA/hg19.fa nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 logging_level=INFO log_to_file=null help=false version=false variant=[(RodBindingCollection [(RodBinding name=variant source=/mnt/causes-vnx1/PIPELINES/AnnotateVariants/Test/NA12878_BWAmem_dupremoved_realigned_HaplotypeCaller_chr20.vcf)]), (RodBindingCollection [(RodBinding name=variant2 source=/mnt/causes-vnx1/PIPELINES/AnnotateVariants/Test/NA12891_BWAmem_dupremoved_realigned_HaplotypeCaller_chr20.vcf)]), (RodBindingCollection [(RodBinding name=variant3 source=/mnt/causes-vnx1/PIPELINES/AnnotateVariants/Test/NA12892_BWAmem_dupremoved_realigned_HaplotypeCaller_chr20.vcf)])] out=/mnt/causes-vnx1/PIPELINES/AnnotateVariants/Test/NA12878_Trio.merged.hc.vcf genotypemergeoption=UNSORTED filteredrecordsmergetype=KEEP_IF_ANY_UNFILTERED multipleallelesmergetype=BY_TYPE rod_priority_list=null printComplexMerges=false filteredAreUncalled=false minimalVCF=false excludeNonVariants=false setKey=set assumeIdenticalSamples=false minimumN=1 suppressCommandLineHeader=false mergeInfoWithMaxAC=false filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">

GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,Version=3.7-0-gcfedb67,Date="Thu Aug 10 05:35:30 PDT 2017",Epoch=1502368530657,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[/scratch/richmonp/PROCESS/NA12892_BWAmem_dupremoved_realigned.sorted.bam] showFullBamList=false read_buffer_size=null read_filter=[] disable_read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/project/projects/def-wyeth/GENOME/ucsc.hg19.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=32 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false out=/scratch/richmonp/PROCESS/NA12892_BWAmem_dupremoved_realigned_HaplotypeCaller.vcf likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN dbsnp=(RodBinding name= source=UNBOUND) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[] excludeAnnotation=[] group=[StandardAnnotation, StandardHCAnnotation] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=NONE bamOutput=null bamWriterType=CALLED_HAPLOTYPES emitDroppedReads=false disableOptimizations=false annotateNDA=false useNewAFCalculator=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 heterozygosity_stdev=0.01 standard_min_confidence_threshold_for_calling=10.0 standard_min_confidence_threshold_for_emitting=30.0 max_alternate_alleles=6 max_genotype_count=1024 max_num_PL_values=100 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=false gcpHMM=10 pair_hmm_implementation=VECTOR_LOGLESS_CACHING pair_hmm_sub_implementation=ENABLE_ALL always_load_vector_logless_PairHMM_lib=false phredScaledGlobalReadMismappingRate=45 noFpga=false sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false maxNumHaplotypesInPopulation=128 errorCorrectKmers=false minPruning=2 debugGraphTransformations=false allowCyclesInKmerGraphToGeneratePaths=false graphOutput=null kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 GVCFGQBands=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 70, 80, 90, 99] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 includeUmappedReads=false useAllelesTrigger=false doNotRunPhysicalPhasing=true keepRG=null justDetermineActiveRegions=false dontGenotype=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false errorCorrectReads=false pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=10000 minReadsPerAlignmentStart=10 mergeVariantsViaLD=false activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxReadsInMemoryPerSample=30000 maxTotalReadsInMemory=10000000 maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">

reference=file:///mnt/causes-vnx1/GENOMES/hg19/FASTA/hg19.fa

SnpEffVersion="4.1l (build 2015-10-03), by Pablo Cingolani"

SnpEffCmd="SnpEff GRCh37.75 "

bcftools_filterVersion=1.8+htslib-1.8

bcftools_filterCommand=filter --include 'FORMAT/AD[:1]>=5 && FORMAT/DP[] < 600' -m + -s + -O z --output /mnt/causes-vnx1/PIPELINES/AnnotateVariants/Test/NA12878_Trio.merged.hc.norm.filter.vcf.gz /mnt/causes-vnx1/PIPELINES/AnnotateVariants/Test/NA12878_Trio.merged.hc.norm.vcf.gz; Date=Wed Mar 20 14:22:41 2019

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12878_BWAmem NA12891_BWAmem NA12892_BWAmem

My offending line (NA12878 trio):

chr20 2465304 . T A 985.8 PASS AC=2;AF=0.5;AN=4;ClippingRankSum=0;DP=116;ExcessHet=3.0103;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;set=variant-variant3;ANN=A|splice_acceptor_variant&intron_variant|HIGH|ZNF343|ENSG00000088876|transcript|ENST00000278772|protein_coding|5/5|c.305-2A>T||||||,A|splice_acceptor_variant&intron_variant|HIGH|ZNF343|ENSG00000088876|transcript|ENST00000465019|retained_intron|1/1|n.333-2A>T||||||,A|splice_acceptor_variant&intron_variant|HIGH|ZNF343|ENSG00000088876|transcript|ENST00000445484|protein_coding|6/6|c.305-2A>T||||||WARNING_TRANSCRIPT_INCOMPLETE,A|intron_variant|MODIFIER|RP4-734P14.4|ENSG00000256566|transcript|ENST00000461548|nonsense_mediated_decay|5/6|n.304+8041A>T||||||;LOF=(ZNF343|ENSG00000088876|6|0.33);SpliceAI_AcceptorGain=0.4199,0.3418;SpliceAI_AcceptorLoss=0,0.6237;SpliceAI_DonorGain=0,0;SpliceAI_DonorLoss=0,0;FATHMM-XF-NONCODING=0.672812;gnomad_genome_af_global=0.002687;gnomad_genome_hom_global=0;gnomad_genome_ac_global=84;gnomad_genome_an_global=31262;gnomad_genome_popmax=nfe;gnomad_genome_af_popmax=71;gnomad_genome_hom_popmax=15364;gnomad_genome_ac_popmax=0.0046212;gnomad_genome_an_popmax=0;gnomad_genome_AF_controls=0.0022202;gnomad_genome_hom_controls=0;gnomad_exome_af_global=0.0026007;gnomad_exome_hom_global=1;gnomad_exome_ac_global=564;gnomad_exome_an_global=216862;gnomad_exome_popmax=nfe;gnomad_exome_af_popmax=399;gnomad_exome_hom_popmax=98272;gnomad_exome_ac_popmax=0.0040602;gnomad_exome_an_popmax=1;gnomad_exome_AF_controls=0.0023002;gnomad_exome_hom_controls=0;af_esp_ea=0.003488372;af_esp_aa=0.000453926;af_esp_all=0.002460403;rs_ids=rs73085335;af_1kg_amr=0;af_1kg_eas=0;af_1kg_sas=0;af_1kg_afr=0;af_1kg_eur=0.005;af_1kg_all=0.0009984;fitcons=0.1061;encode_consensus_gm12878=T;encode_consensus_h1hesc=R;encode_consensus_helas3=T;encode_consensus_hepg2=T;encode_consensus_huvec=T;encode_consensus_k562=T;dgv=CopyNumber;hapmap1=1.1468;hapmap2=8.5074;stam_mean=5.1274;stam_names=Melano;CADD=22.1;ConfidentRegion GT:AD:DP:GQ:PL 0/1:27,28:55:99:1014,0,861 ./.:.:.:.:. 0/1:30,29:59:99:1026,0,1060

From SpliceAI file:

20 2465304 . T C . . SYMBOL=ZNF343;STRAND=-;TYPE=I;DIST=-2;DS_AG=0.0900;DS_AL=0.6239;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-5;DP_AL=-2;DP_DG=1;DP_DL=-2 20 2465304 . T A . . SYMBOL=RP4-734P14.4;STRAND=-;TYPE=I;DIST=8041;DS_AG=0.4199;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-5;DP_AL=-2;DP_DG=-17;DP_DL=-2 20 2465304 . T A . . SYMBOL=ZNF343;STRAND=-;TYPE=I;DIST=-2;DS_AG=0.3418;DS_AL=0.6237;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-5;DP_AL=-2;DP_DG=-17;DP_DL=-2 20 2465304 . T G . . SYMBOL=RP4-734P14.4;STRAND=-;TYPE=I;DIST=8041;DS_AG=0.4720;DS_AL=0.0000;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-5;DP_AL=-2;DP_DG=-17;DP_DL=-2 20 2465304 . T G . . SYMBOL=ZNF343;STRAND=-;TYPE=I;DIST=-2;DS_AG=0.3648;DS_AL=0.6240;DS_DG=0.0000;DS_DL=0.0000;DP_AG=-5;DP_AL=-2;DP_DG=-17;DP_DL=-2

One more offending line, and the error:

chr20 62306757 . C T 673.8 PASS AC=2;AF=0.5;AN=4;ClippingRankSum=0;DP=81;ExcessHet=3.0103;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;set=variant-variant2;ANN=T|downstream_gene_variant|MODIFIER|RTEL1|ENSG00000258366|transcript|ENST00000356810|protein_coding||c.1311C>T|||||1311|WARNING_TRANSCRIPT_INCOMPLETE,T|downstream_gene_variant|MODIFIER|RTEL1|ENSG00000258366|transcript|ENST00000463361|processed_transcript||n.1352C>T|||||1352|,T|intron_variant|MODIFIER|RTEL1|ENSG00000258366|transcript|ENST00000318100|protein_coding|10/35|c.919+1311C>T||||||,T|intron_variant|MODIFIER|RTEL1|ENSG00000258366|transcript|ENST00000370018|protein_coding|10/34|c.919+1311C>T||||||,T|intron_variant|MODIFIER|RTEL1|ENSG00000258366|transcript|ENST00000508582|protein_coding|10/34|c.991+1311C>T||||||,T|intron_variant|MODIFIER|RTEL1|ENSG00000258366|transcript|ENST00000360203|protein_coding|10/34|c.919+1311C>T||||||,T|intron_variant|MODIFIER|RTEL1-TNFRSF6B|ENSG00000026036|transcript|ENST00000492259|nonsense_mediated_decay|9/34|n.919+1311C>T||||||,T|intron_variant|MODIFIER|RTEL1-TNFRSF6B|ENSG00000026036|transcript|ENST00000482936|nonsense_mediated_decay|9/36|n.919+1311C>T||||||;eQTL_GTEX_WholeBloodv7=STMN3;SpliceAI_AcceptorGain=0.0001,0.0001;SpliceAI_AcceptorLoss=0,0;SpliceAI_DonorGain=0.1032,0.1032;SpliceAI_DonorLoss=0,0;FATHMM-XF-NONCODING=0.029967;gnomad_genome_af_global=0.4517;gnomad_genome_hom_global=3344;gnomad_genome_ac_global=14152;gnomad_genome_an_global=31330;gnomad_genome_popmax=nfe;gnomad_genome_af_popmax=7602;gnomad_genome_hom_popmax=15396;gnomad_genome_ac_popmax=0.4938;gnomad_genome_an_popmax=1874;gnomad_genome_AF_controls=0.46;gnomad_genome_hom_controls=1208;InHouseDB_AC=18;rs_ids=rs56053617;af_1kg_amr=0.4207;af_1kg_eas=0.1161;af_1kg_sas=0.2689;af_1kg_afr=0.3812;af_1kg_eur=0.5129;af_1kg_all=0.3379;fitcons=0.1061;encode_consensus_gm12878=T;encode_consensus_h1hesc=T;encode_consensus_helas3=T;encode_consensus_hepg2=T;encode_consensus_huvec=T;encode_consensus_k562=T;dgv=CopyNumber;hapmap1=1.8136;hapmap2=107.2328;CADD=2.623;ConfidentRegion GT:AD:DP:GQ:PL 0/1:19,21:40:99:702,0,653 0/1:23,18:41:99:570,0,792 ./.:.:.:.:.

bad record: AC 2 AF 0.5 AN 4 ANN None BaseQRankSum None CADD 2.623 CADD_indel None CCR None ClippingRankSum 0.0 ConfidentRegion True DP 81 ExcessHet 3.01029992104 FATHMM-XF-NONCODING 0.029967 FS None GeneHancer None InHouseDB_AC 18 LOF None MLEAC 1 MLEAF 0.5 MQ 60.0 MQRankSum 0.0 NMD None OLD_MULTIALLELIC None OLD_VARIANT None PrimateAI None QD None ReadPosRankSum None SOR None SpliceAI_AcceptorGain (9.999999747378752e-05, 9.999999747378752e-05) SpliceAI_AcceptorLoss (0.0, 0.0) SpliceAI_DonorGain (0.10320000350475311, 0.10320000350475311) SpliceAI_DonorLoss (0.0, 0.0) aa_change aa_length None aaf 0.5 ac 2 af 0.5 af_1kg_afr 0.381199985743 af_1kg_all 0.337900012732 af_1kg_amr 0.420700013638 af_1kg_eas 0.116099998355 af_1kg_eur 0.51289999485 af_1kg_sas 0.26890000701 af_esp_aa -1.0 af_esp_all -1.0 af_esp_ea -1.0 alt T an 4 ann None baseqranksum None biotype protein_coding cadd 2.623 cadd_indel None call_rate 0.666666666667 ccr None chrom chr20 clinvar_dbInfo None clinvar_dbinfo None clinvar_disease_name None clinvar_pathogenic None clippingranksum 0.0 codon_change c.919+1311C>T confidentregion True cosmic_ids None cpg_island False cse-hiseq None cse_hiseq False dgv CopyNumber dp 81 ds False eQTL_GTEX_WholeBloodv7 STMN3 effect_severity LOW encode_consensus_gm12878 T encode_consensus_h1hesc T encode_consensus_helas3 T encode_consensus_hepg2 T encode_consensus_huvec T encode_consensus_k562 T end 62306757 ensembl_gene_id None eqtl_gtex_wholebloodv7 STMN3 excesshet 3.01029992104 exon 10/35 fathmm_xf_noncoding 0.029967 filter None fitcons 0.106100000441 fs None gene RTEL1 genehancer None gerp_elements None gnomad_exome_AF_controls None gnomad_exome_ac_global None gnomad_exome_ac_popmax None gnomad_exome_af_controls -1.0 gnomad_exome_af_global -1.0 gnomad_exome_af_popmax -1.0 gnomad_exome_an_global None gnomad_exome_an_popmax None gnomad_exome_hom_controls None gnomad_exome_hom_global None gnomad_exome_hom_popmax None gnomad_exome_popmax None gnomad_genome_AF_controls 0.460000008345 gnomad_genome_ac_global 14152 gnomad_genome_ac_popmax 0.493800014257 gnomad_genome_af_controls 0.460000008345 gnomad_genome_af_global 0.451700001955 gnomad_genome_af_popmax 7602 gnomad_genome_an_global 31330 gnomad_genome_an_popmax 1874 gnomad_genome_hom_controls 1208 gnomad_genome_hom_global 3344 gnomad_genome_hom_popmax 15396 gnomad_genome_popmax nfe gt_alt_depths i ,???? gt_alt_freqs dD??????????????? gt_depths i ,????)( gt_phases ? gt_quals f ,???B?B gt_ref_depths i ,???? gt_types i , gts S (./.C/TC/T gwas_pubmed_trait None hapmap1 1.81359994411 hapmap2 107.232803345 impact intron_variant impact_severity LOW impact_so intron_variant in_rlcr False in_segdup False inhousedb_ac 18 is_canonical False is_coding False is_exonic False is_lof False is_splicing False lof None mleac 1 mleaf 0.5 mq 60.0 mqranksum 0.0 nmd None num_het 2 num_hom_alt 0 num_hom_ref 0 num_unknown 1 old_multiallelic None old_variant None polyphen_pred None polyphen_score None pp2hdiv None pp2hvar None primateai None qd None qual 673.799987793 readposranksum None ref C rmsk None rs_ids rs56053617 set variant-variant2 sift_pred None sift_score None so intron_variant sor None spliceai_acceptorgain (9.999999747378752e-05, 9.999999747378752e-05) spliceai_acceptorloss (0.0, 0.0) spliceai_donorgain (0.10320000350475311, 0.10320000350475311) spliceai_donorloss (0.0, 0.0) stam_mean None stam_names None start 62306756 sub_type ts tfbs None top_consequence intron_variant transcript ENST00000318100 type snp variant_id 147788 vcf_id None Traceback (most recent call last): File "/opt/tools/vcf2db/vcf2db.py", line 923, in impacts_extras=a.impacts_field, aok=a.a_ok) File "/opt/tools/vcf2db/vcf2db.py", line 233, in init self.load() File "/opt/tools/vcf2db/vcf2db.py", line 321, in load self._load(self.vcf, create=False, start=i+1) File "/opt/tools/vcf2db/vcf2db.py", line 311, in _load self.insert(variants, expanded, keys, i, create=create) File "/opt/tools/vcf2db/vcf2db.py", line 373, in insert vilengths, variant_impacts) File "/opt/tools/vcf2db/vcf2db.py", line 401, in _insert self.__insert(v_objs, self.metadata.tables['variants'].insert()) File "/opt/tools/vcf2db/vcf2db.py", line 435, in __insert raise e sqlalchemy.exc.StatementError: (exceptions.TypeError) float() argument must be a string or a number [SQL: u'INSERT INTO variants (variant_id, chrom, start, "end", vcf_id, ref, alt, qual, filter, type, sub_type, call_rate, num_hom_ref, num_het, num_hom_alt, num_unknown, aaf, gene, ensembl_gene_id, transcript, is_exonic, is_coding, is_lof, is_splicing, is_canonical, exon, codon_change, aa_change, aa_length, biotype, impact, impact_so, impact_severity, polyphen_pred, polyphen_score, sift_pred, sift_score, ac, af, an, baseqranksum, cadd, cadd_indel, ccr, clippingranksum, confidentregion, dp, ds, excesshet, fathmm_xf_noncoding, fs, genehancer, inhousedb_ac, lof, mleac, mleaf, mq, mqranksum, nmd, old_multiallelic, old_variant, primateai, qd, readposranksum, sor, spliceai_acceptorgain, spliceai_acceptorloss, spliceai_donorgain, spliceai_donorloss, af_1kg_afr, af_1kg_all, af_1kg_amr, af_1kg_eas, af_1kg_eur, af_1kg_sas, af_esp_aa, af_esp_all, af_esp_ea, clinvar_dbinfo, clinvar_disease_name, clinvar_pathogenic, cosmic_ids, cpg_island, cse_hiseq, dgv, eqtl_gtex_wholebloodv7, encode_consensus_gm12878, encode_consensus_h1hesc, encode_consensus_helas3, encode_consensus_hepg2, encode_consensus_huvec, encode_consensus_k562, fitcons, gerp_elements, gnomad_exome_af_controls, gnomad_exome_ac_global, gnomad_exome_ac_popmax, gnomad_exome_af_global, gnomad_exome_af_popmax, gnomad_exome_an_global, gnomad_exome_an_popmax, gnomad_exome_hom_controls, gnomad_exome_hom_global, gnomad_exome_hom_popmax, gnomad_exome_popmax, gnomad_genome_af_controls, gnomad_genome_ac_global, gnomad_genome_ac_popmax, gnomad_genome_af_global, gnomad_genome_af_popmax, gnomad_genome_an_global, gnomad_genome_an_popmax, gnomad_genome_hom_controls, gnomad_genome_hom_global, gnomad_genome_hom_popmax, gnomad_genome_popmax, gwas_pubmed_trait, hapmap1, hapmap2, in_rlcr, in_segdup, pp2hdiv, pp2hvar, rmsk, rs_ids, "set", stam_mean, stam_names, tfbs, gts, gt_types, gt_phases, gt_depths, gt_ref_depths, gt_alt_depths, gt_quals, gt_alt_freqs) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: [{u'gnomad_genome_hom_controls': 1208, u'gnomad_exome_af_global': -1.0, u'gnomad_exome_AF_controls': None, u'InHouseDB_AC': 18, u'CCR': None, 'gt_phases': <read-only buffer for 0x7f20227a0480, size -1, offset 0 at 0x7f202279f3b0>, u'cse-hiseq': None, u'clinvar_dbInfo': None, 'variant_id': 147788, 'alt': u'T', u'gnomad_exome_hom_controls': None, u'eQTL_GTEX_WholeBloodv7': u'STMN3', 'num_unknown': 1, u'spliceai_donorloss': (0.0, 0.0), 'aa_length': None, 'gt_types': <read-only buffer for 0x7f202279d308, size -1, offset 0 at 0x7f2022793730>, 'is_lof': False, 'ds': False, 'gts': 'S\x0b(./.\x00C/T\x00C/T', u'gnomad_exome_af_popmax': -1.0, u'gnomad_genome_hom_global': 3344, u'af_1kg_sas': 0.2689000070095062, u'gnomad_genome_ac_global': 14152, 'is_exonic': False, u'primateai': None, 'exon': u'10/35', u'clinvar_disease_name': None, 'ensembl_gene_id': None, 'chrom': u'chr20', 'polyphen_score': None, u'dp': 81, u'readposranksum': None, u'spliceai_donorgain': (0.10320000350475311, 0.10320000350475311), 'is_canonical': False, u'rmsk': None, u'gnomad_genome_popmax': u'nfe', u'gnomad_genome_an_popmax': 1874, u'encode_consensus_k562': u'T', 'num_het': 2, u'old_variant': 'None', 'sift_pred': None, 'gt_depths': <read-only buffer for 0x7f202279d340, size -1, offset 0 at 0x7f202279f370>, u'qd': None, 'effect_severity': 'LOW', u'stam_names': None, u'af_1kg_amr': 0.4207000136375427, u'SpliceAI_DonorGain': (0.10320000350475311, 0.10320000350475311), u'gnomad_genome_af_controls': 0.46000000834465027, u'set': u'variant-variant2', 'vcf_id': None, 'gt_quals': <read-only buffer for 0x7f202279d3e8, size -1, offset 0 at 0x7f202279f470>, 'gt_alt_depths': <read-only buffer for 0x7f202279d3b0, size -1, offset 0 at 0x7f202279f430>, u'ConfidentRegion': True, u'pp2hvar': None, u'spliceai_acceptorloss': (0.0, 0.0), u'ExcessHet': 3.0102999210357666, u'gnomad_genome_af_global': 0.45170000195503235, 'gt_ref_depths': <read-only buffer for 0x7f202279d378, size -1, offset 0 at 0x7f202279f3f0>, 'call_rate': 0.6666666666666666, u'NMD': None, u'af_1kg_all': 0.3379000127315521, u'clinvar_dbinfo': 'None', u'encode_consensus_gm12878': u'T', u'af_esp_all': -1.0, u'gnomad_genome_an_global': 31330, 'ref': u'C', 'gt_alt_freqs': <read-only buffer for 0x7f202279f4b0, size -1, offset 0 at 0x7f202279f4f0>, u'ClippingRankSum': 0.0, u'stam_mean': None, 'impact': u'intron_variant', u'gnomad_genome_AF_controls': 0.46000000834465027, u'mqranksum': 0.0, u'af_1kg_eas': 0.1160999983549118, u'af_esp_ea': -1.0, u'nmd': 'None', u'old_multiallelic': None, u'lof': 'None', 'sub_type': 'ts', u'encode_consensus_hepg2': u'T', u'excesshet': 3.0102999210357666, u'encode_consensus_helas3': u'T', u'ANN': None, u'af_1kg_eur': 0.5128999948501587, u'SpliceAI_AcceptorGain': (9.999999747378752e-05, 9.999999747378752e-05), 'filter': None, 'codon_change': u'c.919+1311C>T', u'gnomad_genome_af_popmax': 7602, u'MQ': 60.0, u'gnomad_exome_hom_global': None, u'hapmap2': 107.23280334472656, u'SpliceAI_DonorLoss': (0.0, 0.0), u'hapmap1': 1.813599944114685, u'SpliceAI_AcceptorLoss': (0.0, 0.0), u'FS': None, u'gerp_elements': None, 'top_consequence': u'intron_variant', u'gnomad_exome_an_global': None, 'num_hom_ref': 0, u'gnomad_genome_ac_popmax': 0.49380001425743103, 'is_splicing': False, u'PrimateAI': None, u'cadd': u'2.623', u'cosmic_ids': None, u'rs_ids': u'rs56053617', u'fitcons': 0.10610000044107437, u'MLEAC': 1, u'MLEAF': 0.5, u'gnomad_exome_hom_popmax': None, u'af': 0.5, 'polyphen_pred': None, u'cadd_indel': None, u'genehancer': None, 'start': 62306756, 'sift_score': None, u'OLD_MULTIALLELIC': None, 'type': 'snp', u'af_1kg_afr': 0.38119998574256897, u'MQRankSum': 0.0, 'impact_severity': 'LOW', u'pp2hdiv': None, u'gnomad_genome_hom_popmax': 15396, u'inhousedb_ac': 18, 'qual': 673.7999877929688, u'spliceai_acceptorgain': (9.999999747378752e-05, 9.999999747378752e-05), u'fathmm_xf_noncoding': u'0.029967', u'confidentregion': True, u'baseqranksum': None, 'aaf': 0.5, u'encode_consensus_huvec': u'T', u'in_rlcr': False, u'tfbs': None, u'mq': 60.0, 'num_hom_alt': 0, u'clinvar_pathogenic': None, u'in_segdup': False, u'ac': 2, u'BaseQRankSum': None, u'LOF': None, u'mleac': 1, u'ann': None, u'gnomad_exome_popmax': None, u'an': 4, u'encode_consensus_h1hesc': u'T', u'CADD': u'2.623', 'cse_hiseq': False, u'mleaf': 0.5, u'sor': None, u'FATHMM-XF-NONCODING': u'0.029967', u'DP': 81, u'GeneHancer': None, u'gnomad_exome_an_popmax': None, 'end': 62306757, u'gwas_pubmed_trait': None, u'cpg_island': False, u'gnomad_exome_ac_popmax': None, u'OLD_VARIANT': None, u'CADD_indel': None, u'SOR': None, u'clippingranksum': 0.0, u'eqtl_gtex_wholebloodv7': u'STMN3', u'AC': 2, u'fs': None, 'is_coding': False, u'gnomad_exome_af_controls': -1.0, u'AF': 0.5, u'AN': 4, u'dgv': u'CopyNumber', 'biotype': u'protein_coding', 'transcript': u'ENST00000318100', u'ReadPosRankSum': None, 'gene': u'RTEL1', 'aa_change': u'', u'ccr': None, u'af_esp_aa': -1.0, 'so': u'intron_variant', u'QD': None, 'impact_so': u'intron_variant', u'gnomad_exome_ac_global': None}]]

Phillip-a-richmond commented 5 years ago

Note: When I removed SpliceAI from my VCFAnno config the problem went away. So it's definitely that SpliceAI issue. Not sure how best to deal with it though since we do want SpliceAI in our pipeline.

Cheers, Phil

brentp commented 5 years ago

when you run vcfanno for spliceai, use max as the aggregator. otherwise you get multiple values (as you've noted) and these cause problems with vcf3db

Phillip-a-richmond commented 5 years ago

Fixed the issue. Thanks.

-Phil