Open jeromekelleher opened 1 month ago
max_mutations_per_site is also much higher at 2060 vs 890.
This site is a likely cause:
The mutation C>T at position 21846 lies within amplicon 72 (ARTIC v3), which suffers from dropout in Delta samples, and so the position may be affected by sequencing artifacts (see this paper).
Do the reversions tend to happen within Delta (B.1.617.2) lineages?
Confirming that this site is the one with a chain of 128 successive mutations. The mutations with > 2 parents are flip-flopping between C and T.
28271 is the next highest mutation count, and is also likely problematic:
Note that 28271 showed similar problems in the GISAID data, so seems pretty likely to be problematic and a good candidate for exclusion
The next site with highest mutation count is 27638. This looks different, with consistent flicking back and forth between T and C:
27752 seems quite similar:
As of 2021-02-26 we have a max_mutation_parents value of 129, which is clearly pathological. Investigate.