luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
299 stars 37 forks source link

Wrong number of AC fields for multiallelics, cannot split using bcftools #212

Closed mvelinder closed 2 years ago

mvelinder commented 2 years ago

Describe the bug I successfully ran a trio using population but am getting a bcftools norm error when trying to normalize and split multiallelics

Version

$ octopus --version
octopus version 0.6.3-beta
Target: x86_64 Linux 5.3.0-1032-azure
Compiler: GNU 7.5.0
Boost: 1_70

Here's the bcftools error:

bcftools norm -c s -m - -f $FASTA $VCF -O z -o $VCF.norm.vcf.gz
Error: wrong number of fields in INFO/AC at chr1:976746, expected 2, found 1

There are two variant records at this position. This one:

chr1    976746  .   G   A,* 10000   PASS    AC=7,1;AN=8;DP=162;MQ=55;MQ0=0;NS=4 GT:GQ:DP:MQ:PS:PQ:FT    1|1:852:52:55:976669:99:PASS    1|2:131:45:54:976669:99:PASS    1|1:815:34:56:976669:99:PASS    1|1:159:31:56:976669:99:PASS

And this one - which seems to be the error causing record:

chr1    976746  .   GC  G,*C    10000   PASS    AC=5;AN=8;DP=162;MQ=55;MQ0=0;NS=4   GT:GQ:DP:MQ:PS:PQ:FT    1|0:999:52:55:976669:99:PASS    1|2:999:45:54:976669:99:PASS    1|0:999:34:56:976669:99:PASS    1|0:425:31:56:976669:99:PASS

... with only one AC value.

Any ideas? This blocks any ability to do downstream analysis with tools like slivar, which require multiallelics to be split out.

I also had to include bcftools norm -c s because I was receiving a bcftools error:

Reference allele mismatch at chr1:248752514 .. REF_SEQ:'N' vs VCF:'A'

Not sure about that one either, I checked manually and it was indeed listed as A in the VCF and is definitely N in the same FASTA used for variant calling.

Thanks for the help!

dancooke commented 2 years ago

I think this may have already been fixed. Could you try v0.7.4?

mvelinder commented 2 years ago

Thanks @dancooke I reran with 0.7.4 on this region and successfully split the multiallelics!