luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
301 stars 37 forks source link

variant position in repeats #172

Closed cariaso closed 3 years ago

cariaso commented 3 years ago

https://github.com/KarchinLab/open-cravat/issues/64 is a specific real world case of a variant within a TG repeat, where the lack of standard positioning across all tools and databases seems to cause a problem.

Does octopus enforce some policy on correct positioning? (5'? 3'? best effort or guaranteed?) What are your views on the ability and desirability of standardization across different tools?

dancooke commented 3 years ago

Octopus left aligns variants as described here. You can consider this a guarantee - if you find instances where variants are not left aligned then I'm happy to consider it a bug.

I'm not convinced standardising left-alignment in VCF is the way to go. There are other representation issues that left alignment doesn't solve which is why tools like hap.py and Haplosaurus exist. Furthermore, as @rkimoakbioinformatics mentions, different representations can have distinct biological interpretations. In principle, one could try to determine the actual mutation event that occurred (e.g., the units deleted within a tandem repeat); enforcing left-alignment would make it difficult to convey such information.

cariaso commented 3 years ago

thanks for the information.