yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
120 stars 40 forks source link

Suggestion to update removal rules to adapt to the current mass recombinant situation #352

Open aviczhl2 opened 9 months ago

aviczhl2 commented 9 months ago

Usher removes seqs with >5 reversions or >20 private mutations to prevent artefact.

However, recently a lot of recombinants between DV, EG, JD, GK, FL, FU,... are popping up and sometimes these recombinants are usually removed due to having too many reversions or private mutations. The number of such removal events is quite a lot now.

For example

https://github.com/sars-cov-2-variants/lineage-proposals/issues/846

https://github.com/sars-cov-2-variants/lineage-proposals/issues/839

https://github.com/sars-cov-2-variants/lineage-proposals/issues/811

https://github.com/sars-cov-2-variants/lineage-proposals/issues/674

https://github.com/sars-cov-2-variants/lineage-proposals/issues/879

https://github.com/sars-cov-2-variants/lineage-proposals/issues/888

And these recomb are more and more. It becomes harder and harder to separate and list every of those removed recombinants and add them back to tree.

A suggestion is that for new seqs before they are removed check for recombinants and see if it is close to any potential recomb. If it is then not remove.

Or allow manually add stationery points that prevents removal for seqs close to such points so that at least seqs for each recombinant won't be removed after the initial detection.