Open iqbal-lab opened 2 years ago
Note this is also an issue with nanopore data
G here:
and V here:
Definitely want to retest this with the refactor after stabilised, as this was definitely not fixed in earlier tags of the refactor. Recent fixes including primer-dimer filter (min frag length) suggest worth another look. leaving open
Raising this issue to be followed up once we have a release candidate to evaluate.
Theo Sanderson spotted that if we look at spike 142, and colour a tree by genotype at that position, it looks a bit messy. This may be due to the issue he has previously raised (https://www.medrxiv.org/content/10.1101/2021.10.14.21264847v2), adding text from a message here: The issue Theo found previously occurs in delta when primers are incorporated into the middle of a big amplicon (https://virological.org/t/missing-g21987a-mutation-in-sars-cov-2-delta-variants-due-to-non-specific-amplification-by-artic-v3-primers/764), making things hard/sometimes_impossible to fix depending on whether fragmentation is used.
This is a tree of genomes assembled by viridian 0.3.7
Our intention was that read-filtering should have prevented this; this from Theo:
"Re jumbo amplicon. If the amplicon looks like
[amplicon 71] [amplicon 72_LEFT primer] [amplicon 72] [amplicon 72_RIGHT primer] [amplicon 73]
If you then fragment that into 50p fragments then it's hard to tell if a [amplicon 71] [amplicon 72_LEFT] section is from this jumbo amplicon or is a legitimate read from the bit of amplicon 71 that overlaps amplicon 72"