Closed hyanwong closed 1 year ago
Oh, doh, we do this already.
In addition, we exclude 481 problematic sites flagged as prone to sequencing errors or as highly homoplasic entirely (https://github.com/W-L/ ProblematicSites_SARS-CoV2/, accessed 2022-09-22)
Sorry.
I just saw https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 (from Nicola De Maio & colleges, FWIW). They say:
It might be worth seeing if (a) we identify these (or others) as hypervariable (b) if we exclude them, so we get better results (e.g. fewer false positives)