harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

Incorrect multi-base REF/ALT alleles #107

Closed tsackton closed 2 weeks ago

tsackton commented 1 year ago

If there is a multinucleotide or complex mutation, it is possible to end up with multi-base REF and ALT alleles, despite there being only a single base variant. E.g., REF = AAA, ALT = ATA. This appears to happen when all the individuals that carry the multi-base variant are removed from the VCF; the REF/ALT of the remaining alleles is not updated.

So for example a tri-allelic site with AAA, ATA, A-- will become AAA, ATA if the A-- individual is removed.

We have a conceptual fix, but it is not implemented yet. This issue exists to document the problem.

tsackton commented 2 weeks ago

Should be fixed with #199