luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
302 stars 38 forks source link

Clarification of genotype calls for somatic calling model #96

Closed jbedo closed 4 years ago

jbedo commented 4 years ago

I have some output from the somatic calling model:

chr11   17483350        .       G       A       2.41    PASS    AC=1;AN=5;DP=526;MP=17.77;MQ=59;NS=2;PP=2.41;SOMATIC    GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:HF_CR:FT   0|0:217:169:59:17483350:100:254.684,246.316:0.51,0.49:0.47,0.54,0.45,0.53:PASS       0|1|0:217:357:58:17483350:100:432.9,14,389.388:0.52,0.016,0.47:0.49,0.55,0.0099,0.024,0.44,0.49:PASS

Could you clarify what is intended by a somatic variant with GT call 0|1|0 but germline call 0|0? My understanding was that the first two columns are germline not somatic haplotypes.

This call was produced by the latest dev branch. I was previously using the latest release but on this data a segfault occurred and I ran the latest to see if the segfault had been fixed. I have not observed such calls with the release.

dancooke commented 4 years ago

Could you clarify what is intended by a somatic variant with GT call 0|1|0 but germline call 0|0? My understanding was that the first two columns are germline not somatic haplotypes.

In v0.6.3-beta, any somatic haplotypes did indeed appear after the germline haplotypes. However, from v0.7.0 onwards, the order of GT will be consistent across all calling models - lexicographical order. Your example was due to a bug that I've just fixed (09513af91387cde9b16ef5d7b9f82f27568c3d59). The order of the tumour genotypes should now be 1|0|0 since A < G under lexicographical ordering. Does this cause issues for you? I could potentially add a FORMAT field indicating which haplotypes are somatic, but it would be a little redundant since the somatic haplotypes can easily be identified from the provided information.

jbedo commented 4 years ago

Thanks for the clarification, I think it's fine to just work out the lexicographic ordering rather than adding an additional field. I've verified 09513af fixed the ordering on my data.

jbedo commented 4 years ago

The ordering bug is still not fixed, I have the following call using the latest dev:

chr11   126162610       .       C       A       10.14   PASS    AC=1;AN=5;DP=256;MP=4.34;MQ=57;NS=2;PP=10.14;SOMATIC    GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:HF_CR:FT        0|0:153:112:56:126162607:100:142.962,188.038:0.43,0.57:0.39,0.48,0.52,0.61:PASS 0|1|0:153:144:58:126162607:100:140.883,9.9,259.171:0.34,0.024,0.63:0.31,0.38,0.013,0.038,0.59,0.67:PASS
dancooke commented 4 years ago

What is the call at chr11:126162607?

jbedo commented 4 years ago
chr11   126162607       .       G       A       1515.79 MP      AC=3;AN=5;DP=253;MP=4.34;MQ=57;NS=2;PP=1515.79  GT:GQ:DP:MQ:PS:PQ:FT    1|0:1516:108:56:126162607:100:MP        1|1|0:1516:145:58:126162607:100:MP
chr11   126162610       .       C       A       10.14   PASS    AC=1;AN=5;DP=256;MP=4.34;MQ=57;NS=2;PP=10.14;SOMATIC    GT:GQ:DP:MQ:PS:PQ:HPC:MAP_HF:HF_CR:FT   0|0:153:112:56:126162607:100:142.962,188.038:0.43,0.57:0.39,0.48,0.52,0.61:PASS 0|1|0:153:144:58:126162607:100:140.883,9.9,2
dancooke commented 4 years ago

Should be fixed in latest develop branch version (as of 29e4d4778abc83333345eb52919d7ed7a4b19526).

jbedo commented 4 years ago

Confirmed resolved. Thanks for the quick fix!