samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
640 stars 241 forks source link

bcftools 1.10 Unexpected type 0 #1123

Closed bryce-turner closed 4 years ago

bryce-turner commented 4 years ago

After testing with the latest release (1.10) we've encountered an error when using bcftools view and filter:

chr1 17000202 . A C . clustered_events;haplotype;normal_artifact;strand_bias CONTQ=64;DP=125;ECNT=3;GERMQ=93;MBQ=17,29;MFRL=193,164;MMQ=60,60;MPOS=6;NALOD=-0.8027;NLOD=15.98;POPAF=6;ROQ=69;SEQQ=53;STRANDQ=1;TLOD=11.37 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:38,6:0.143:44:8,3:4,1:0|1:17000202_A_C:17000202:8,30,6,0 0|0:68,1:0.028:69:12,1:22,0:0|1:17000[E::bcf_fmt_array] Unexpected type 0

However if we look at this same line with zcat we see: chr1 17000202 . A C . clustered_events;haplotype;normal_artifact;strand_bias CONTQ=64;DP=125;ECNT=3;GERMQ=93;MBQ=17,29;MFRL=193,164;MMQ=60,60;MPOS=6;NALOD=-8.027e-01;NLOD=15.98;POPAF=6.00;ROQ=69;SEQQ=53;STRANDQ=1;TLOD=11.37 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:38,6:0.143:44:8,3:4,1:0|1:17000202_A_C:17000202:8,30,6,0 0|0:68,1:0.028:69:12,1:22,0:0|1:17000202_A_C:17000202:11,57,1,0

We don't encounter this [E::bcf_fmt_array] Unexpected type 0 when using bcftools 1.9 though. Additionally here is our header, excluding the contigs:


##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=base_qual,Description="alt median base quality">
##FILTER=<ID=clustered_events,Description="Clustered events observed in the tumor">
##FILTER=<ID=contamination,Description="contamination">
##FILTER=<ID=duplicate,Description="evidence for alt allele is overrepresented by apparent duplicates">
##FILTER=<ID=fragment,Description="abs(ref - alt) median fragment length">
##FILTER=<ID=germline,Description="Evidence indicates this site is germline, not somatic">
##FILTER=<ID=haplotype,Description="Variant near filtered variant on same haplotype.">
##FILTER=<ID=low_allele_frac,Description="Allele fraction is below specified threshold">
##FILTER=<ID=map_qual,Description="ref - alt median mapping quality">
##FILTER=<ID=multiallelic,Description="Site filtered because too many alt alleles pass tumor LOD">
##FILTER=<ID=n_ratio,Description="Ratio of N to alt exceeds specified ratio">
##FILTER=<ID=normal_artifact,Description="artifact_in_normal">
##FILTER=<ID=numt_chimera,Description="NuMT variant with too many ALT reads originally from autosome">
##FILTER=<ID=numt_novel,Description="Alt depth is below expected coverage of NuMT in autosome">
##FILTER=<ID=orientation,Description="orientation bias detected by the orientation bias mixture model">
##FILTER=<ID=panel_of_normals,Description="Blacklisted site in panel of normals">
##FILTER=<ID=position,Description="median distance of alt variants from end of reads">
##FILTER=<ID=slippage,Description="Site filtered due to contraction of short tandem repeat region">
##FILTER=<ID=strand_bias,Description="Evidence for alt allele comes from one read direction only">
##FILTER=<ID=strict_strand,Description="Evidence for alt allele is not represented in both directions">
##FILTER=<ID=weak_evidence,Description="Mutation does not meet likelihood threshold">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine=<ID=FilterMutectCalls,CommandLine="FilterMutectCalls  --output exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U.bwa.mutect2.all.vcf.gz --stats temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/merged.stats --filtering-stats temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/filtering.stats --max-alt-allele-count 2 --contamination-table temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/contamination.table --tumor-segmentation temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/segments.table --orientation-bias-artifact-priors temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/artifact-priors.tar.gz --variant temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa.mutect2.raw.vcf.gz --reference /home/tgenref/homo_sapiens/grch38_hg38/hg38tgen/genome_reference/GRCh38tgen_decoy_alts_hla.fa --tmp-dir temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/temp_filter/  --threshold-strategy OPTIMAL_F_SCORE --f-score-beta 1.0 --false-discovery-rate 0.05 --initial-threshold 0.1 --mitochondria-mode false --max-events-in-region 2 --unique-alt-read-count 0 --min-median-mapping-quality 30 --min-median-base-quality 20 --max-median-fragment-length-difference 10000 --min-median-read-position 1 --max-n-ratio Infinity --min-reads-per-strand 0 --autosomal-coverage 0.0 --max-numt-fraction 0.85 --min-allele-fraction 0.0 --contamination-estimate 0.0 --log-snv-prior -13.815510557964275 --log-indel-prior -16.11809565095832 --log-artifact-prior -2.302585092994046 --normal-p-value-threshold 0.001 --min-slippage-length 8 --pcr-slippage-rate 0.1 --distance-on-haplotype 100 --long-indel-length 5 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false",Version="4.1.4.0",Date="December 7, 2019 11:15:33 AM MST">
##GATKCommandLine=<ID=Mutect2,CommandLine="Mutect2  --f1r2-tar-gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/1.f1r2.tar.gz --tumor-sample MMRF_1923_1_BM_CD138pos_T1 --normal-sample MMRF_1923_1_PB_WBC_C2 --germline-resource /home/tgenref/homo_sapiens/grch38_hg38/public_databases/gnomad/r3.0/gnomad.genomes.r3.0.sites.pass.AnnotationReference.vcf.gz --independent-mates true --output temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/1.mutect2.vcf.gz --intervals chr1:10001-207666 --intervals chr1:257667-297968 --intervals chr1:347969-535988 --intervals chr1:585989-2702781 --intervals chr1:2746291-12954384 --intervals chr1:13004385-16799163 --intervals chr1:16849164-29552233 --input exome/alignment/bwa/MMRF_1923_1_PB_WBC_C2_KHS5U/MMRF_1923_1_PB_WBC_C2_KHS5U.bwa.bam --input exome/alignment/bwa/MMRF_1923_1_BM_CD138pos_T1_KHS5U/MMRF_1923_1_BM_CD138pos_T1_KHS5U.bwa.bam --reference /home/tgenref/homo_sapiens/grch38_hg38/hg38tgen/genome_reference/GRCh38tgen_decoy_alts_hla.fa --tmp-dir temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/temp_mutect2_1/  --f1r2-median-mq 50 --f1r2-min-bq 20 --f1r2-max-depth 200 --genotype-pon-sites false --genotype-germline-sites false --af-of-alleles-not-in-resource -1.0 --mitochondria-mode false --tumor-lod-to-emit 3.0 --initial-tumor-lod 2.0 --pcr-snv-qual 40 --pcr-indel-qual 40 --max-population-af 0.01 --downsampling-stride 1 --callable-depth 10 --max-suspicious-reads-per-alignment-start 0 --normal-lod 2.2 --ignore-itr-artifacts false --gvcf-lod-band -2.5 --gvcf-lod-band -2.0 --gvcf-lod-band -1.5 --gvcf-lod-band -1.0 --gvcf-lod-band -0.5 --gvcf-lod-band 0.0 --gvcf-lod-band 0.5 --gvcf-lod-band 1.0 --minimum-allele-fraction 0.0 --disable-adaptive-pruning false --dont-trim-active-regions false --max-extension 25 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --max-unpruned-variants 100 --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --min-base-quality-score 10 --smith-waterman JAVA --emit-ref-confidence NONE --max-mnp-distance 1 --force-call-filtered-alleles false --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false --max-read-length 2147483647 --min-read-length 30 --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version="4.1.4.0",Date="December 7, 2019 10:25:48 AM MST">
##INFO=<ID=CONTQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to contamination">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=ECNT,Number=1,Type=Integer,Description="Number of events in this haplotype">
##INFO=<ID=GERMQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not germline variants">
##INFO=<ID=MBQ,Number=R,Type=Integer,Description="median base quality">
##INFO=<ID=MFRL,Number=R,Type=Integer,Description="median fragment length">
##INFO=<ID=MMQ,Number=R,Type=Integer,Description="median mapping quality">
##INFO=<ID=MPOS,Number=A,Type=Integer,Description="median distance from end of read">
##INFO=<ID=NALOD,Number=A,Type=Float,Description="Negative log 10 odds of artifact in normal with same allele fraction as tumor">
##INFO=<ID=NCount,Number=1,Type=Integer,Description="Count of N bases in the pileup">
##INFO=<ID=NLOD,Number=A,Type=Float,Description="Normal log 10 likelihood ratio of diploid het or hom alt genotypes">
##INFO=<ID=OCM,Number=1,Type=Integer,Description="Number of alt reads whose original alignment doesn't match the current contig.">
##INFO=<ID=PON,Number=0,Type=Flag,Description="site found in panel of normals">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="negative log 10 population allele frequencies of alt alleles">
##INFO=<ID=ROQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to read orientation artifact">
##INFO=<ID=RPA,Number=.,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">
##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">
##INFO=<ID=SEQQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not sequencing errors">
##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
##INFO=<ID=STRANDQ,Number=1,Type=Integer,Description="Phred-scaled quality of strand bias artifact">
##INFO=<ID=STRQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles in STRs are not polymerase slippage errors">
##INFO=<ID=TLOD,Number=A,Type=Float,Description="Log 10 likelihood ratio score of variant existing versus not existing">
##INFO=<ID=UNIQ_ALT_READ_COUNT,Number=1,Type=Integer,Description="Number of ALT reads with unique start and mate end positions at a variant site">
##MutectVersion=2.2
##bcftools_concatCommand=concat --output-type z --output temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa.mutect2.raw.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/1.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/2.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/3.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/4.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/5.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/6.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/7.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/8.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/9.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/10.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/11.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/12.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/13.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/14.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/15.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/16.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/17.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/18.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/19.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/20.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/21.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/22.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/23.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/24.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/25.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/26.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/27.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/28.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/29.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/30.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/31.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/32.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/33.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/34.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/35.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/36.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/37.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/38.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/39.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/40.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/41.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/42.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/43.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/44.mutect2.vcf.gz temp/exome/somatic_variant_calls/mutect2/MMRF_1923_1_PB_WBC_C2_KHS5U-MMRF_1923_1_BM_CD138pos_T1_KHS5U_bwa/45.mutect2.vcf.gz; Date=Sat Dec  7 11:15:16 2019
##bcftools_concatVersion=1.10+htslib-1.10```
pd3 commented 4 years ago

I am unable to reproduce the error with the header and the data line you provided. What is the exact command you are running? Any chance you could provide a test case?

PedalheadPHX commented 4 years ago

happy to provide the example file, do you have a DM link for the files?

pd3 commented 4 years ago

Thank you for the test case. The problem was introduced when 64-bit support was added to htslib. A minimal example to reproduce the problem:

$ cat test.vcf
##fileformat=VCFv4.2
##INFO=<ID=MPOS,Number=A,Type=Integer,Description="dummy">
##INFO=<ID=NALOD,Number=A,Type=Float,Description="dummy">
##INFO=<ID=NLOD,Number=A,Type=Float,Description="dummy">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="dummy">
##contig=<ID=chr1,length=248956422>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    1   .   G   C   .   .   MPOS=-2147483648;NALOD=-8.279e-01;NLOD=15.45;POPAF=6.00

$ bcftools view test.vcf
jmarshall commented 4 years ago

Is it the case that the problematic line (from which Petr has distilled a minimal example) is in fact the line following the chr1 17000202 . A C line shown in @TGEN-BTurner's original report? (And if so it would be great if you'd use zcat to post that line here too.)

(Or it may be several lines further on — the way that line has been clipped at …|1:17000 suggests that the ‘final’ line of output you're seeing is an artefact of stdout buffering.)

jkbonfield commented 4 years ago

Indeed we still haven't seen the original data which triggered the whole problem. @pd3 - was the MPOS field you constructed for your example the same name and value that was culled from the test data you were provided? This would really help in a bug report to know that the issue we found and fixed is infact the same one. @TGEN-BTurner can you please check whether PR samtools/htslib#1000 fixes your problem?

bryce-turner commented 4 years ago

I can confirm that samtools/htslib#1000 fixes the problem. I tested on a different sample than before but here is a before and after the fix being applied:

Before:

chr1    43290221    .   T   A   .   base_qual;haplotype;weak_evidence   CONTQ=12;DP=12;ECNT=2;GERMQ=20;MBQ=32,10;MFRL=176,180;MMQ=60,60;MPOS=12;NALOD=1;NLOD=2.7;POPAF=6;ROQ=57;SEQQ=1;STRANDQ=16;TLOD=3.42 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:2,1:0.4:3:2,0:0,0:0|1:43290221_T_A:43290221:2,0,1,0 0|0:9,0:0.091:9:4,0:3,0:0|1:43290221_T_A:43290221:5,4,0,0
chr1    43290242    .   C   A   .   haplotype;weak_evidence CONTQ=13;DP=13;ECNT=2;GERMQ=22;MBQ=37,31;MFRL=176,180;MMQ=60,60;MPOS=33;NALOD=1.04;NLOD=3;POPAF=6;ROQ=60;SEQQ=1;STRANDQ=18;TLOD=3.67    GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:1,1:0.5:2:1,0:0,1:0|1:43290221_T_A:43290221:1,0,1,0 0|0:10,0:0.083:10:6,0:3,0:0|1:43290221_T_A:43290221:6,4,0,0
chr1    43314053    .   TTGTG   T,TTG   .   germline;normal_artifact    CONTQ=93;DP=208;ECNT=1;GERMQ=1;MBQ=38,38,38;MFRL=186,174,189;MMQ=60,60,60;MPOS=28,26;NALOD=-4.571,-20.77;[E::bcf_fmt_array] Unexpected type 0

After:

chr1    43290221        .       T       A       .       base_qual;haplotype;weak_evidence       CONTQ=12;DP=12;ECNT=2;GERMQ=20;MBQ=32,10;MFRL=176,180;MMQ=60,60;MPOS=12;NALOD=1;NLOD=2.7;POPAF=6;ROQ=57;SEQQ=1;STRANDQ=16;TLOD=3.42  GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB     0|1:2,1:0.4:3:2,0:0,0:0|1:43290221_T_A:43290221:2,0,1,0 0|0:9,0:0.091:9:4,0:3,0:0|1:43290221_T_A:43290221:5,4,0,0
chr1    43290242        .       C       A       .       haplotype;weak_evidence CONTQ=13;DP=13;ECNT=2;GERMQ=22;MBQ=37,31;MFRL=176,180;MMQ=60,60;MPOS=33;NALOD=1.04;NLOD=3;POPAF=6;ROQ=60;SEQQ=1;STRANDQ=18;TLOD=3.67    GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB     0|1:1,1:0.5:2:1,0:0,1:0|1:43290221_T_A:43290221:1,0,1,0 0|0:10,0:0.083:10:6,0:3,0:0|1:43290221_T_A:43290221:6,4,0,0
chr1    43314053        .       TTGTG   T,TTG   .       germline;normal_artifact        CONTQ=93;DP=208;ECNT=1;GERMQ=1;MBQ=38,38,38;MFRL=186,174,189;MMQ=60,60,60;MPOS=28,26;NALOD=-4.571,-20.77;NLOD=3.44,-18.72;POPAF=6,6;ROQ=93;RPA=11,9,10;RU=TG;SEQQ=93;STR;STRANDQ=44;STRQ=93;TLOD=3.28,17.87     GT:AD:AF:DP:F1R2:F2R1:SB        0/1/2:61,2,9:0.037,0.133:72:27,2,3:28,0,4:9,52,1,10     0/0:48,3,10:0.058,0.172:61:24,0,5:21,3,5:12,36,4,9
chr1    43363190        .       G       GT      .       normal_artifact;slippage;weak_evidence  CONTQ=30;DP=443;ECNT=1;GERMQ=93;MBQ=38,34;MFRL=181,184;MMQ=60,60;MPOS=21;NALOD=-3.447;NLOD=35.08;POPAF=6;ROQ=93;RPA=10,11;RU=T;SEQQ=1;STR;STRANDQ=54;STRQ=1;TLOD=3.29   GT:AD:AF:DP:F1R2:F2R1:SB        0/1:173,7:0.034:180:107,3:65,3:79,94,3,4        0/0:166,7:0.036:173:88,4:73,2:70,96,4,3
chr1    43422694        .       T       C       .       haplotype;normal_artifact;position;strand_bias  CONTQ=69;DP=269;ECNT=2;GERMQ=93;MBQ=37,31;MFRL=173,182;MMQ=60,60;MPOS=0;NALOD=-18.25;NLOD=8.84;POPAF=6;ROQ=64;SEQQ=93;STRANDQ=1;TLOD=21.52      GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB     0|1:127,8:0.066:135:62,6:55,2:0|1:43422694_T_C:43422694:75,52,8,0       0|0:127,7:0.059:134:61,3:59,2:0|1:43422694_T_C:43422694:89,38,7,0
chr1    43422696        .       T       C       .       haplotype;normal_artifact;strand_bias   CONTQ=69;DP=279;ECNT=2;GERMQ=93;MBQ=38,33;MFRL=172,182;MMQ=60,60;MPOS=-2147483648;NALOD=-18.27;NLOD=8.58;POPAF=6;ROQ=55;SEQQ=93;STRANDQ=1;TLOD=21.51    GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB     0|1:127,8:0.065:135:66,4:60,1:0|1:43422694_T_C:43422694:75,52,8,0       0|0:126,7:0.059:133:64,2:58,3:0|1:43422694_T_C:43422694:89,37,7,0
chr1    43499804        .       GT      G       .       slippage;weak_evidence  CONTQ=15;DP=19;ECNT=1;GERMQ=8;MBQ=39,36;MFRL=168,220;MMQ=60,60;MPOS=15;NALOD=0.715;NLOD=2.36;POPAF=6;ROQ=93;RPA=10,9;RU=T;SEQQ=1;STR;STRANDQ=14;STRQ=1;TLOD=3.67        GT:AD:AF:DP:F1R2:F2R1:SB        0/1:6,2:0.303:8:5,2:1,0:1,5,0,2 0/0:8,0:0.097:8:7,0:1,0:4,4,0,0
chr1    43592587        .       G       A       .       contamination;weak_evidence     CONTQ=1;DP=145;ECNT=1;GERMQ=93;MBQ=39,39;MFRL=194,159;MMQ=60,60;MPOS=33;NALOD=1.86;NLOD=21.07;POPAF=4.85;ROQ=44;SEQQ=1;STRANDQ=8;TLOD=3.56      GT:AD:AF:DP:F1R2:F2R1:SB        0/1:61,2:0.045:63:35,2:25,0:3,58,0,2    0/0:70,0:0.014:70:45,0:25,0:3,67,0,0
chr1    43621836        .       C       T       .       contamination;weak_evidence     CONTQ=1;DP=209;ECNT=1;GERMQ=93;MBQ=39,39;MFRL=197,217;MMQ=60,60;MPOS=35;NALOD=2.02;NLOD=30.39;POPAF=6;ROQ=63;SEQQ=1;STRANDQ=8;TLOD=3.08 GT:AD:AF:DP:F1R2:F2R1:SB        0/1:96,2:0.03:98:58,1:35,1:90,6,2,0     0/0:101,0:0.009441:101:65,0:36,0:92,9,0,0
jkbonfield commented 4 years ago

On request, the proposal now is a bit different. That MPOS=-2147483648 will become MPOS=.. This is to permit such data to be able to be written to BCF. That's over in samtools/htslib#1004.

I think this is fine. The -2147483648 is just the result of a ghastly bug due to failure to initialise a variable correctly. Replacing it with the "missing" value is the most accurate representation of what happened.