iqbal-lab-org / viridian

MIT License
13 stars 5 forks source link

Artefact in 0.3.9 to evaluate on 0.9: variants in spike 142 in delta likely due to dropouts #52

Open iqbal-lab opened 2 years ago

iqbal-lab commented 2 years ago

Raising this issue to be followed up once we have a release candidate to evaluate.

Theo Sanderson spotted that if we look at spike 142, and colour a tree by genotype at that position, it looks a bit messy. This may be due to the issue he has previously raised (https://www.medrxiv.org/content/10.1101/2021.10.14.21264847v2), adding text from a message here: The issue Theo found previously occurs in delta when primers are incorporated into the middle of a big amplicon (https://virological.org/t/missing-g21987a-mutation-in-sars-cov-2-delta-variants-due-to-non-specific-amplification-by-artic-v3-primers/764), making things hard/sometimes_impossible to fix depending on whether fragmentation is used.

This is a tree of genomes assembled by viridian 0.3.7

Screenshot 2022-05-11 at 14 19 46

Our intention was that read-filtering should have prevented this; this from Theo:

"Re jumbo amplicon. If the amplicon looks like

[amplicon 71] [amplicon 72_LEFT primer] [amplicon 72] [amplicon 72_RIGHT primer] [amplicon 73]

If you then fragment that into 50p fragments then it's hard to tell if a [amplicon 71] [amplicon 72_LEFT] section is from this jumbo amplicon or is a legitimate read from the bit of amplicon 71 that overlaps amplicon 72"

iqbal-lab commented 2 years ago

Note this is also an issue with nanopore data

G here:

Screenshot 2022-05-12 at 23 56 31

and V here:

Screenshot 2022-05-12 at 23 56 47
iqbal-lab commented 2 years ago

Definitely want to retest this with the refactor after stabilised, as this was definitely not fixed in earlier tags of the refactor. Recent fixes including primer-dimer filter (min frag length) suggest worth another look. leaving open