samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
649 stars 240 forks source link

bcftools +split-vep -s worst; Too few columns at chr1:69059 .. 1 (Consequence) >= 1 #2057

Closed robertzeibich closed 8 months ago

robertzeibich commented 8 months ago

Downloaded v4 Downloads Variants Genomes chr1 sites VCF: https://storage.googleapis.com/gcp-public-data--gnomad/release/4.0/vcf/genomes/gnomad.genomes.v4.0.sites.chr1.vcf.bgz

version: bcftools 1.15

command: bcftools +split-vep -s worst -a vep -f '%CHROM-%POS-%REF-%ALT\t%AF\t%AF_nfe\t%AF_asj\t%AF_mid\t%AF_fin\t%AF_ami\t%AF_amr\t%AF_remaining\t%AF_eas\t%AF_sas\t%AF_afr\t%IMPACT\t%SYMBOL\t%Feature_type\t%BIOTYPE\n' /scratch/xm41/rzei0002/gnomAD_v4_Variants_Genomes/gnomad.genomes.v4.0.sites.chr1.vcf.bgz

Error message: chr1-69038-T-G 0 0 0 0 0 0 0 0 0 0 0 MODERATE OR4F5 Transcript protein_coding chr1-69045-A-G 7.77605e-05 0 0 0 0 0 0 0 0 0.001937Too few columns at chr1:69059 .. 1 (Consequence) >= 1

I updated my bcftools version to 1.19. The error message has changed:

Warning: The number of INFO/vep subfields at chr1:69059 does not match the header definition, expected 46 subfields, found as few as 1. (This warning is printed only once.)

In the header there are 45 "|"

Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|ALLELE_NUM|DISTANCE|STRAND|FLAGS| VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|MANE_SELECT|MANE_PLUS_CLINICAL|TSL|APPRIS|CCDS|ENSP|UNIPROT_ISOFORM|SOURCE|DOMAINS|miRNA|HGVS_OFFSET|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS| MOTIF_SCORE_CHANGE|TRANSCRIPTION_FACTORS|LoF|LoF_filter|LoF_flags|LoF_info

In the body of the VCF there are 3x47 "|" at chr1:69059. Strange is that the error is only thrown when there is information after the last pipe.

chr1 69059 . G A . AC0;AS_VQSR AC=0;AN=15346;AF=0.00000;AC_XX=0;AF_XX=0.00000;AN_XX=8562;nhomalt_XX=0;AC_XY=0;AF_XY=0.00000;AN_XY=6784;nhomalt_XY=0;nhomalt=0;AC_afr_XX=0;AF_afr_XX=0.00000;AN_afr_XX=4588;nhomalt_afr_XX=0;AC_afr_XY=0;AF_afr_XY=0.00000;AN_afr_XY=3568;nhomalt_afr_XY=0;AC_afr=0;AF_afr=0.00000;AN_afr=8156;nhomalt_afr=0;AC_ami_XX=0;AF_ami_XX=0.00000;AN_ami_XX=18;nhomalt_ami_XX=0;AC_ami_XY=0;AF_ami_XY=0.00000;AN_ami_XY=18;nhomalt_ami_XY=0;AC_ami=0;AF_ami=0.00000;AN_ami=36;nhomalt_ami=0;AC_amr_XX=0;AF_amr_XX=0.00000;AN_amr_XX=532;nhomalt_amr_XX=0;AC_amr_XY=0;AF_amr_XY=0.00000;AN_amr_XY=556;nhomalt_amr_XY=0;AC_amr=0;AF_amr=0.00000;AN_amr=1088;nhomalt_amr=0;AC_asj_XX=0;AF_asj_XX=0.00000;AN_asj_XX=64;nhomalt_asj_XX=0;AC_asj_XY=0;AF_asj_XY=0.00000;AN_asj_XY=60;nhomalt_asj_XY=0;AC_asj=0;AF_asj=0.00000;AN_asj=124;nhomalt_asj=0;AC_eas_XX=0;AF_eas_XX=0.00000;AN_eas_XX=592;nhomalt_eas_XX=0;AC_eas_XY=0;AF_eas_XY=0.00000;AN_eas_XY=768;nhomalt_eas_XY=0;AC_eas=0;AF_eas=0.00000;AN_eas=1360;nhomalt_eas=0;AC_fin_XX=0;AF_fin_XX=0.00000;AN_fin_XX=58;nhomalt_fin_XX=0;AC_fin_XY=0;AF_fin_XY=0.00000;AN_fin_XY=60;nhomalt_fin_XY=0;AC_fin=0;AF_fin=0.00000;AN_fin=118;nhomalt_fin=0;AC_mid_XX=0;AF_mid_XX=0.00000;AN_mid_XX=34;nhomalt_mid_XX=0;AC_mid_XY=0;AF_mid_XY=0.00000;AN_mid_XY=22;nhomalt_mid_XY=0;AC_mid=0;AF_mid=0.00000;AN_mid=56;nhomalt_mid=0;AC_nfe_XX=0;AF_nfe_XX=0.00000;AN_nfe_XX=2340;nhomalt_nfe_XX=0;AC_nfe_XY=0;AF_nfe_XY=0.00000;AN_nfe_XY=1264;nhomalt_nfe_XY=0;AC_nfe=0;AF_nfe=0.00000;AN_nfe=3604;nhomalt_nfe=0;AC_raw=1;AF_raw=9.96492e-06;AN_raw=100352;nhomalt_raw=0;AC_remaining_XX=0;AF_remaining_XX=0.00000;AN_remaining_XX=110;nhomalt_remaining_XX=0;AC_remaining_XY=0;AF_remaining_XY=0.00000;AN_remaining_XY=98;nhomalt_remaining_XY=0;AC_remaining=0;AF_remaining=0.00000;AN_remaining=208;nhomalt_remaining=0;AC_sas_XX=0;AF_sas_XX=0.00000;AN_sas_XX=226;nhomalt_sas_XX=0;AC_sas_XY=0;AF_sas_XY=0.00000;AN_sas_XY=370;nhomalt_sas_XY=0;AC_sas=0;AF_sas=0.00000;AN_sas=596;nhomalt_sas=0;AC_joint_XX=0;AF_joint_XX=0.00000;AN_joint_XX=8562;nhomalt_joint_XX=0;AC_joint_XY=0;AF_joint_XY=0.00000;AN_joint_XY=6784;nhomalt_joint_XY=0;AC_joint=0;AF_joint=0.00000;AN_joint=15346;nhomalt_joint=0;AC_joint_afr_XX=0;AF_joint_afr_XX=0.00000;AN_joint_afr_XX=4588;nhomalt_joint_afr_XX=0;AC_joint_afr_XY=0;AF_joint_afr_XY=0.00000;AN_joint_afr_XY=3568;nhomalt_joint_afr_XY=0;AC_joint_afr=0;AF_joint_afr=0.00000;AN_joint_afr=8156;nhomalt_joint_afr=0;AC_joint_ami_XX=0;AF_joint_ami_XX=0.00000;AN_joint_ami_XX=18;nhomalt_joint_ami_XX=0;AC_joint_ami_XY=0;AF_joint_ami_XY=0.00000;AN_joint_ami_XY=18;nhomalt_joint_ami_XY=0;AC_joint_ami=0;AF_joint_ami=0.00000;AN_joint_ami=36;nhomalt_joint_ami=0;AC_joint_amr_XX=0;AF_joint_amr_XX=0.00000;AN_joint_amr_XX=532;nhomalt_joint_amr_XX=0;AC_joint_amr_XY=0;AF_joint_amr_XY=0.00000;AN_joint_amr_XY=556;nhomalt_joint_amr_XY=0;AC_joint_amr=0;AF_joint_amr=0.00000;AN_joint_amr=1088;nhomalt_joint_amr=0;AC_joint_asj_XX=0;AF_joint_asj_XX=0.00000;AN_joint_asj_XX=64;nhomalt_joint_asj_XX=0;AC_joint_asj_XY=0;AF_joint_asj_XY=0.00000;AN_joint_asj_XY=60;nhomalt_joint_asj_XY=0;AC_joint_asj=0;AF_joint_asj=0.00000;AN_joint_asj=124;nhomalt_joint_asj=0;AC_joint_eas_XX=0;AF_joint_eas_XX=0.00000;AN_joint_eas_XX=592;nhomalt_joint_eas_XX=0;AC_joint_eas_XY=0;AF_joint_eas_XY=0.00000;AN_joint_eas_XY=768;nhomalt_joint_eas_XY=0;AC_joint_eas=0;AF_joint_eas=0.00000;AN_joint_eas=1360;nhomalt_joint_eas=0;AC_joint_fin_XX=0;AF_joint_fin_XX=0.00000;AN_joint_fin_XX=58;nhomalt_joint_fin_XX=0;AC_joint_fin_XY=0;AF_joint_fin_XY=0.00000;AN_joint_fin_XY=60;nhomalt_joint_fin_XY=0;AC_joint_fin=0;AF_joint_fin=0.00000;AN_joint_fin=118;nhomalt_joint_fin=0;AC_joint_mid_XX=0;AF_joint_mid_XX=0.00000;AN_joint_mid_XX=34;nhomalt_joint_mid_XX=0;AC_joint_mid_XY=0;AF_joint_mid_XY=0.00000;AN_joint_mid_XY=22;nhomalt_joint_mid_XY=0;AC_joint_mid=0;AF_joint_mid=0.00000;AN_joint_mid=56;nhomalt_joint_mid=0;AC_joint_nfe_XX=0;AF_joint_nfe_XX=0.00000;AN_joint_nfe_XX=2340;nhomalt_joint_nfe_XX=0;AC_joint_nfe_XY=0;AF_joint_nfe_XY=0.00000;AN_joint_nfe_XY=1264;nhomalt_joint_nfe_XY=0;AC_joint_nfe=0;AF_joint_nfe=0.00000;AN_joint_nfe=3604;nhomalt_joint_nfe=0;AC_joint_raw=1;AF_joint_raw=9.96492e-06;AN_joint_raw=100352;nhomalt_joint_raw=0;AC_joint_remaining_XX=0;AF_joint_remaining_XX=0.00000;AN_joint_remaining_XX=110;nhomalt_joint_remaining_XX=0;AC_joint_remaining_XY=0;AF_joint_remaining_XY=0.00000;AN_joint_remaining_XY=98;nhomalt_joint_remaining_XY=0;AC_joint_remaining=0;AF_joint_remaining=0.00000;AN_joint_remaining=208;nhomalt_joint_remaining=0;AC_joint_sas_XX=0;AF_joint_sas_XX=0.00000;AN_joint_sas_XX=226;nhomalt_joint_sas_XX=0;AC_joint_sas_XY=0;AF_joint_sas_XY=0.00000;AN_joint_sas_XY=370;nhomalt_joint_sas_XY=0;AC_joint_sas=0;AF_joint_sas=0.00000;AN_joint_sas=596;nhomalt_joint_sas=0;faf95=0.00000;faf95_afr=0.00000;faf95_amr=0.00000;faf95_eas=0.00000;faf95_nfe=0.00000;faf95_sas=0.00000;faf99=0.00000;faf99_afr=0.00000;faf99_amr=0.00000;faf99_eas=0.00000;faf99_nfe=0.00000;faf99_sas=0.00000;faf95_joint=0.00000;faf95_joint_afr=0.00000;faf95_joint_amr=0.00000;faf95_joint_eas=0.00000;faf95_joint_nfe=0.00000;faf95_joint_sas=0.00000;faf99_joint=0.00000;faf99_joint_afr=0.00000;faf99_joint_amr=0.00000;faf99_joint_eas=0.00000;faf99_joint_nfe=0.00000;faf99_joint_sas=0.00000;age_hist_het_bin_freq=0|0|0|0|0|0|0|0|0|0;age_hist_het_n_smaller=0;age_hist_het_n_larger=0;age_hist_hom_bin_freq=0|0|0|0|0|0|0|0|0|0;age_hist_hom_n_smaller=0;age_hist_hom_n_larger=0;FS=0.00000;MQ=25.0000;MQRankSum=0.198000;QUALapprox=84;QD=4.94118;ReadPosRankSum=0.198000;SOR=0.693000;VarDP=17;AS_FS=0.00000;AS_MQ=25.0000;AS_MQRankSum=0.736000;AS_pab_max=1.00000;AS_QUALapprox=70;AS_QD=7.00000;AS_ReadPosRankSum=0.198000;AS_SB_TABLE=3,9|0,3;AS_SOR=1.26976;AS_VarDP=10;inbreeding_coeff=-9.96502e-06;AS_culprit=AS_MQ;AS_VQSLOD=-4.00900;allele_type=snv;n_alt_alleles=2;variant_type=multi-snv;segdup;gq_hist_alt_bin_freq=0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;gq_hist_all_bin_freq=0|0|0|0|5705|1031|693|182|40|15|5|0|0|2|0|0|0|0|0|0;dp_hist_alt_bin_freq=0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;dp_hist_alt_n_larger=0;dp_hist_all_bin_freq=0|0|5012|2106|370|147|36|2|0|0|0|0|0|0|0|0|0|0|0|0;dp_hist_all_n_larger=0;ab_hist_alt_bin_freq=0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;cadd_raw_score=0.894153;cadd_phred=10.3700;phylop=5.29800;VRS_Allele_IDs=ga4gh:VA.n5AaNLXbJUASX7qF622bJRA0OcehrREM,ga4gh:VA.5QUPg2mdqsBd0U8k-cQVW-VeqoenCw5c;VRS_Starts=69058,69058;VRS_Ends=69059,69059;VRS_States=G,A;vep=A|stop_gained|HIGH|OR4F5|ENSG00000186092|Transcript|ENST00000641515|protein_coding|3/3||ENST00000641515.2:c.32G>A|ENSP00000493376.2:p.Trp11Ter|92|32|11|W/|tGg/tAg|1||1||SNV|HGNC|HGNC:14825|YES|NM_001005484.2|||P1||ENSP00000493376||Ensembl||||||||||||HC|||PERCENTILE:0.0326197757390418,GERP_DIST:0,BP_DIST:949,DIST_FROM_LAST_EXON:-22,50_BP_RULE:FAIL,ANN_ORF:2015.65,MAX_ORF:2015.65,A|downstream_gene_variant|MODIFIER|OR4G11P|ENSG00000240361|Transcript|ENST00000642116|processed_transcript||||||||||1|4943|1||SNV|HGNC|HGNC:31276|||||||||Ensembl|||||||||||||||,A|stop_gained|HIGH|OR4F5|79501|Transcript|NM_001005484.2|protein_coding|3/3||NM_001005484.2:c.32G>A|NP_001005484.2:p.Trp11Ter|92|32|11|W/|tGg/tAg|1||1||SNV|EntrezGene|HGNC:14825|YES|ENST00000641515.2|||||NP_001005484.2||RefSeq||||||||||||HC|||PERCENTILE:0.0326197757390418,GERP_DIST:0,BP_DIST:949,DIST_FROM_LAST_EXON:-22,50_BP_RULE:FAIL,PHYLOCSF_TOO_SHORT

pd3 commented 8 months ago

Please update to the latest version of the program, I believe this problem has been fixed.