luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
301 stars 37 forks source link

Haplotype frequencies seem incorrect with polyclone #141

Closed bredelings closed 3 years ago

bredelings commented 3 years ago

Describe the bug

Haplotype frequencies, as given by e.g. MAP_HF seem to be incorrect. They disagree both with the evidence BAM and also the AD field. For example, at the first three lines in the block below, the first haplotype (1-0-0) has frequency 0.85 and the second haplotype (1-1-1) has frequency 0.15. However, the AD counts for the alleles in the second haplotype are much less than 15%. For the second locus we have 42/(42+1214) = 3.3% and 18/(18+1683) = 1.1%

(I wondered if this was somehow a result of the downsampling commands, but they don't seem to make a difference).

LT635612        193308  .       C       G       4502.89 PASS    AC=2;AN=2;DP=162;MQ=59;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      1|1:360:162:59:193308:100:285.801,52:0.85,0.15:.,1282:PASS
LT635612        193321  .       T       TC      355.4   AF      AC=1;AN=2;DP=172;MQ=59;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      0|1:355:172:59:193308:100:285.801,52:0.85,0.15:1214,42:AF
LT635612        193558  .       C       T       357.86  AF      AC=1;AN=2;DP=139;MQ=60;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      0|1:358:139:60:193308:100:285.801,52:0.85,0.15:1683,18:AF
LT635612        193795  .       G       A       3644.72 PASS    AC=1;AN=1;DP=147;MQ=59;NS=1     GT:GQ:DP:MQ:PS:PQ:AD:FT 1:7:147:59:193795:100:.,1635:PASS
LT635612        194653  .       TC      T       216.34  PASS    AC=1;AN=2;DP=281;MQ=59;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      1|0:216:281:59:194653:100:143.245,367.755:0.28,0.72:1051,142:PASS
LT635612        195122  .       T       A       815.72  AF      AC=1;AN=2;DP=142;MQ=59;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      1|0:816:142:59:195122:100:88,528.363:0.14,0.86:1802,28:AF
LT635612        195264  .       A       T       554.15  AF      AC=1;AN=2;DP=130;MQ=59;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      1|0:554:130:59:195122:100:88,528.363:0.14,0.86:1757,25:AF
LT635612        195681  .       C       G       3531.62 PASS    AC=2;AN=2;DP=119;MQ=59;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      1|1:201:119:59:195681:100:253.591,41:0.86,0.14:.,1291:PASS
LT635612        195834  .       T       TA      201.15  AF      AC=1;AN=2;DP=176;MQ=59;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      0|1:201:176:59:195681:100:253.591,41:0.86,0.14:1419,39:AF
LT635612        196538  .       A       G       3612.96 PASS    AC=2;AN=2;DP=118;MQ=60;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      1|1:552:118:60:196538:100:468.126,116.874:0.8,0.2:.,1299:PASS
LT635612        196622  .       G       A       2634.9  AF      AC=1;AN=2;DP=121;MQ=59;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      1|0:552:121:59:196538:100:468.126,116.874:0.8,0.2:25,1476:AF
LT635612        197542  .       A       G       2456.93 AF      AC=1;AN=2;DP=119;MQ=59;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      0|1:287:119:59:197542:100:44,229.239:0.16,0.84:15,1491:AF
LT635612        197704  .       G       GA      98.69   AF      AC=1;AN=2;DP=180;MQ=59;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      0|1:99:180:59:197704:100:207.423,22:0.91,0.094:2001,21:AF
LT635612        198863  .       CT      C       72.96   AF      AC=1;AN=2;DP=167;MQ=59;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      1|0:73:167:59:198863:100:79,807.651:0.089,0.91:1899,18:AF
LT635612        198931  .       T       G       3824.19 PASS    AC=2;AN=2;DP=155;MQ=59;NS=1     GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT      1|1:73:155:59:198863:100:79,807.651:0.089,0.91:.,1883:PASS

Version

$ octopus --version
octopus version 0.7.0 (develop b83ce113)
Target: x86_64 Linux 2.6.32-642.11.1.el6.x86_64
SIMD extension: SSE2
Compiler: GNU 10.1.0
Boost: 1_74

Command line to run octopus:

$ octopus -I ERR2679006.bam -R ~/malaria/reference/plasmo-combined.fasta -T LT635612 -o chr1.poly2.vcf.gz --bamout chr1.poly2.bam --bamout-type FULL  --filter-expression "QUAL < 10 | MQ < 10 | MP < 10 | AF < 0.05 | SB > 0.98 | BQ < 15 | DP < 1 | AD < 1" --annotations AD -C polyclone  --max-clones 2 --threads 15 --downsample-above=300 --downsample-target=200
dancooke commented 3 years ago

Duplicate of #142