Closed bryce-turner closed 4 years ago
I am unable to reproduce the error with the header and the data line you provided. What is the exact command you are running? Any chance you could provide a test case?
happy to provide the example file, do you have a DM link for the files?
Thank you for the test case. The problem was introduced when 64-bit support was added to htslib. A minimal example to reproduce the problem:
$ cat test.vcf
##fileformat=VCFv4.2
##INFO=<ID=MPOS,Number=A,Type=Integer,Description="dummy">
##INFO=<ID=NALOD,Number=A,Type=Float,Description="dummy">
##INFO=<ID=NLOD,Number=A,Type=Float,Description="dummy">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="dummy">
##contig=<ID=chr1,length=248956422>
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 1 . G C . . MPOS=-2147483648;NALOD=-8.279e-01;NLOD=15.45;POPAF=6.00
$ bcftools view test.vcf
Is it the case that the problematic line (from which Petr has distilled a minimal example) is in fact the line following the chr1 17000202 . A C
line shown in @TGEN-BTurner's original report? (And if so it would be great if you'd use zcat to post that line here too.)
(Or it may be several lines further on — the way that line has been clipped at …|1:17000
suggests that the ‘final’ line of output you're seeing is an artefact of stdout buffering.)
Indeed we still haven't seen the original data which triggered the whole problem. @pd3 - was the MPOS field you constructed for your example the same name and value that was culled from the test data you were provided? This would really help in a bug report to know that the issue we found and fixed is infact the same one. @TGEN-BTurner can you please check whether PR samtools/htslib#1000 fixes your problem?
I can confirm that samtools/htslib#1000 fixes the problem. I tested on a different sample than before but here is a before and after the fix being applied:
Before:
chr1 43290221 . T A . base_qual;haplotype;weak_evidence CONTQ=12;DP=12;ECNT=2;GERMQ=20;MBQ=32,10;MFRL=176,180;MMQ=60,60;MPOS=12;NALOD=1;NLOD=2.7;POPAF=6;ROQ=57;SEQQ=1;STRANDQ=16;TLOD=3.42 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:2,1:0.4:3:2,0:0,0:0|1:43290221_T_A:43290221:2,0,1,0 0|0:9,0:0.091:9:4,0:3,0:0|1:43290221_T_A:43290221:5,4,0,0
chr1 43290242 . C A . haplotype;weak_evidence CONTQ=13;DP=13;ECNT=2;GERMQ=22;MBQ=37,31;MFRL=176,180;MMQ=60,60;MPOS=33;NALOD=1.04;NLOD=3;POPAF=6;ROQ=60;SEQQ=1;STRANDQ=18;TLOD=3.67 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:1,1:0.5:2:1,0:0,1:0|1:43290221_T_A:43290221:1,0,1,0 0|0:10,0:0.083:10:6,0:3,0:0|1:43290221_T_A:43290221:6,4,0,0
chr1 43314053 . TTGTG T,TTG . germline;normal_artifact CONTQ=93;DP=208;ECNT=1;GERMQ=1;MBQ=38,38,38;MFRL=186,174,189;MMQ=60,60,60;MPOS=28,26;NALOD=-4.571,-20.77;[E::bcf_fmt_array] Unexpected type 0
After:
chr1 43290221 . T A . base_qual;haplotype;weak_evidence CONTQ=12;DP=12;ECNT=2;GERMQ=20;MBQ=32,10;MFRL=176,180;MMQ=60,60;MPOS=12;NALOD=1;NLOD=2.7;POPAF=6;ROQ=57;SEQQ=1;STRANDQ=16;TLOD=3.42 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:2,1:0.4:3:2,0:0,0:0|1:43290221_T_A:43290221:2,0,1,0 0|0:9,0:0.091:9:4,0:3,0:0|1:43290221_T_A:43290221:5,4,0,0
chr1 43290242 . C A . haplotype;weak_evidence CONTQ=13;DP=13;ECNT=2;GERMQ=22;MBQ=37,31;MFRL=176,180;MMQ=60,60;MPOS=33;NALOD=1.04;NLOD=3;POPAF=6;ROQ=60;SEQQ=1;STRANDQ=18;TLOD=3.67 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:1,1:0.5:2:1,0:0,1:0|1:43290221_T_A:43290221:1,0,1,0 0|0:10,0:0.083:10:6,0:3,0:0|1:43290221_T_A:43290221:6,4,0,0
chr1 43314053 . TTGTG T,TTG . germline;normal_artifact CONTQ=93;DP=208;ECNT=1;GERMQ=1;MBQ=38,38,38;MFRL=186,174,189;MMQ=60,60,60;MPOS=28,26;NALOD=-4.571,-20.77;NLOD=3.44,-18.72;POPAF=6,6;ROQ=93;RPA=11,9,10;RU=TG;SEQQ=93;STR;STRANDQ=44;STRQ=93;TLOD=3.28,17.87 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2:61,2,9:0.037,0.133:72:27,2,3:28,0,4:9,52,1,10 0/0:48,3,10:0.058,0.172:61:24,0,5:21,3,5:12,36,4,9
chr1 43363190 . G GT . normal_artifact;slippage;weak_evidence CONTQ=30;DP=443;ECNT=1;GERMQ=93;MBQ=38,34;MFRL=181,184;MMQ=60,60;MPOS=21;NALOD=-3.447;NLOD=35.08;POPAF=6;ROQ=93;RPA=10,11;RU=T;SEQQ=1;STR;STRANDQ=54;STRQ=1;TLOD=3.29 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:173,7:0.034:180:107,3:65,3:79,94,3,4 0/0:166,7:0.036:173:88,4:73,2:70,96,4,3
chr1 43422694 . T C . haplotype;normal_artifact;position;strand_bias CONTQ=69;DP=269;ECNT=2;GERMQ=93;MBQ=37,31;MFRL=173,182;MMQ=60,60;MPOS=0;NALOD=-18.25;NLOD=8.84;POPAF=6;ROQ=64;SEQQ=93;STRANDQ=1;TLOD=21.52 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:127,8:0.066:135:62,6:55,2:0|1:43422694_T_C:43422694:75,52,8,0 0|0:127,7:0.059:134:61,3:59,2:0|1:43422694_T_C:43422694:89,38,7,0
chr1 43422696 . T C . haplotype;normal_artifact;strand_bias CONTQ=69;DP=279;ECNT=2;GERMQ=93;MBQ=38,33;MFRL=172,182;MMQ=60,60;MPOS=-2147483648;NALOD=-18.27;NLOD=8.58;POPAF=6;ROQ=55;SEQQ=93;STRANDQ=1;TLOD=21.51 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:127,8:0.065:135:66,4:60,1:0|1:43422694_T_C:43422694:75,52,8,0 0|0:126,7:0.059:133:64,2:58,3:0|1:43422694_T_C:43422694:89,37,7,0
chr1 43499804 . GT G . slippage;weak_evidence CONTQ=15;DP=19;ECNT=1;GERMQ=8;MBQ=39,36;MFRL=168,220;MMQ=60,60;MPOS=15;NALOD=0.715;NLOD=2.36;POPAF=6;ROQ=93;RPA=10,9;RU=T;SEQQ=1;STR;STRANDQ=14;STRQ=1;TLOD=3.67 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:6,2:0.303:8:5,2:1,0:1,5,0,2 0/0:8,0:0.097:8:7,0:1,0:4,4,0,0
chr1 43592587 . G A . contamination;weak_evidence CONTQ=1;DP=145;ECNT=1;GERMQ=93;MBQ=39,39;MFRL=194,159;MMQ=60,60;MPOS=33;NALOD=1.86;NLOD=21.07;POPAF=4.85;ROQ=44;SEQQ=1;STRANDQ=8;TLOD=3.56 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:61,2:0.045:63:35,2:25,0:3,58,0,2 0/0:70,0:0.014:70:45,0:25,0:3,67,0,0
chr1 43621836 . C T . contamination;weak_evidence CONTQ=1;DP=209;ECNT=1;GERMQ=93;MBQ=39,39;MFRL=197,217;MMQ=60,60;MPOS=35;NALOD=2.02;NLOD=30.39;POPAF=6;ROQ=63;SEQQ=1;STRANDQ=8;TLOD=3.08 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:96,2:0.03:98:58,1:35,1:90,6,2,0 0/0:101,0:0.009441:101:65,0:36,0:92,9,0,0
On request, the proposal now is a bit different. That MPOS=-2147483648
will become MPOS=.
. This is to permit such data to be able to be written to BCF. That's over in samtools/htslib#1004.
I think this is fine. The -2147483648 is just the result of a ghastly bug due to failure to initialise a variable correctly. Replacing it with the "missing" value is the most accurate representation of what happened.
After testing with the latest release (1.10) we've encountered an error when using bcftools view and filter:
chr1 17000202 . A C . clustered_events;haplotype;normal_artifact;strand_bias CONTQ=64;DP=125;ECNT=3;GERMQ=93;MBQ=17,29;MFRL=193,164;MMQ=60,60;MPOS=6;NALOD=-0.8027;NLOD=15.98;POPAF=6;ROQ=69;SEQQ=53;STRANDQ=1;TLOD=11.37 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:38,6:0.143:44:8,3:4,1:0|1:17000202_A_C:17000202:8,30,6,0 0|0:68,1:0.028:69:12,1:22,0:0|1:17000[E::bcf_fmt_array] Unexpected type 0
However if we look at this same line with zcat we see:
chr1 17000202 . A C . clustered_events;haplotype;normal_artifact;strand_bias CONTQ=64;DP=125;ECNT=3;GERMQ=93;MBQ=17,29;MFRL=193,164;MMQ=60,60;MPOS=6;NALOD=-8.027e-01;NLOD=15.98;POPAF=6.00;ROQ=69;SEQQ=53;STRANDQ=1;TLOD=11.37 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:38,6:0.143:44:8,3:4,1:0|1:17000202_A_C:17000202:8,30,6,0 0|0:68,1:0.028:69:12,1:22,0:0|1:17000202_A_C:17000202:11,57,1,0
We don't encounter this [E::bcf_fmt_array] Unexpected type 0 when using bcftools 1.9 though. Additionally here is our header, excluding the contigs: