Haplotype frequencies, as given by e.g. MAP_HF seem to be incorrect. They disagree both with the evidence BAM and also the AD field. For example, at the first three lines in the block below, the first haplotype (1-0-0) has frequency 0.85 and the second haplotype (1-1-1) has frequency 0.15. However, the AD counts for the alleles in the second haplotype are much less than 15%. For the second locus we have 42/(42+1214) = 3.3% and 18/(18+1683) = 1.1%
(I wondered if this was somehow a result of the downsampling commands, but they don't seem to make a difference).
LT635612 193308 . C G 4502.89 PASS AC=2;AN=2;DP=162;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 1|1:360:162:59:193308:100:285.801,52:0.85,0.15:.,1282:PASS
LT635612 193321 . T TC 355.4 AF AC=1;AN=2;DP=172;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 0|1:355:172:59:193308:100:285.801,52:0.85,0.15:1214,42:AF
LT635612 193558 . C T 357.86 AF AC=1;AN=2;DP=139;MQ=60;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 0|1:358:139:60:193308:100:285.801,52:0.85,0.15:1683,18:AF
LT635612 193795 . G A 3644.72 PASS AC=1;AN=1;DP=147;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ:AD:FT 1:7:147:59:193795:100:.,1635:PASS
LT635612 194653 . TC T 216.34 PASS AC=1;AN=2;DP=281;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 1|0:216:281:59:194653:100:143.245,367.755:0.28,0.72:1051,142:PASS
LT635612 195122 . T A 815.72 AF AC=1;AN=2;DP=142;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 1|0:816:142:59:195122:100:88,528.363:0.14,0.86:1802,28:AF
LT635612 195264 . A T 554.15 AF AC=1;AN=2;DP=130;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 1|0:554:130:59:195122:100:88,528.363:0.14,0.86:1757,25:AF
LT635612 195681 . C G 3531.62 PASS AC=2;AN=2;DP=119;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 1|1:201:119:59:195681:100:253.591,41:0.86,0.14:.,1291:PASS
LT635612 195834 . T TA 201.15 AF AC=1;AN=2;DP=176;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 0|1:201:176:59:195681:100:253.591,41:0.86,0.14:1419,39:AF
LT635612 196538 . A G 3612.96 PASS AC=2;AN=2;DP=118;MQ=60;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 1|1:552:118:60:196538:100:468.126,116.874:0.8,0.2:.,1299:PASS
LT635612 196622 . G A 2634.9 AF AC=1;AN=2;DP=121;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 1|0:552:121:59:196538:100:468.126,116.874:0.8,0.2:25,1476:AF
LT635612 197542 . A G 2456.93 AF AC=1;AN=2;DP=119;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 0|1:287:119:59:197542:100:44,229.239:0.16,0.84:15,1491:AF
LT635612 197704 . G GA 98.69 AF AC=1;AN=2;DP=180;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 0|1:99:180:59:197704:100:207.423,22:0.91,0.094:2001,21:AF
LT635612 198863 . CT C 72.96 AF AC=1;AN=2;DP=167;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 1|0:73:167:59:198863:100:79,807.651:0.089,0.91:1899,18:AF
LT635612 198931 . T G 3824.19 PASS AC=2;AN=2;DP=155;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:AD:FT 1|1:73:155:59:198863:100:79,807.651:0.089,0.91:.,1883:PASS
Version
$ octopus --version
octopus version 0.7.0 (develop b83ce113)
Target: x86_64 Linux 2.6.32-642.11.1.el6.x86_64
SIMD extension: SSE2
Compiler: GNU 10.1.0
Boost: 1_74
Describe the bug
Haplotype frequencies, as given by e.g. MAP_HF seem to be incorrect. They disagree both with the evidence BAM and also the AD field. For example, at the first three lines in the block below, the first haplotype (1-0-0) has frequency 0.85 and the second haplotype (1-1-1) has frequency 0.15. However, the AD counts for the alleles in the second haplotype are much less than 15%. For the second locus we have 42/(42+1214) = 3.3% and 18/(18+1683) = 1.1%
(I wondered if this was somehow a result of the downsampling commands, but they don't seem to make a difference).
Version
Command line to run octopus: