jmcbroome / autolin

Repository containing a method for automatically identifying pathogen lineages from a phylogeny.
14 stars 1 forks source link

Correct Handling Of Reversion Mutations #3

Open jmcbroome opened 1 year ago

jmcbroome commented 1 year ago

Currently I do not attempt to “cancel out” or otherwise handle reversions, as the way the lineage information routine is defined is strictly about the full ancestry path of any given sample. It’s also potentially valuable to record and track reversions of an important mutation, assuming that these reversions are true events. However, in some cases, proposals of small groups descended from repeated conversions/reversions due to contamination may receive an excessively high score due to a series of changes in a seemingly important area, and may also have a faulty bloom lab escape value (as the bloom lab escape scores, as implemented in the calculator script underlying their site, are not state-aware- only care that a site “has changed”).

Potential solutions include ignoring this programmatically and filtering them in human review, since this case is relatively rare (depending on parameter sets), removing all reversion mutations on the tree prior to lineage inference (likely overaggressive and will fail to correctly depict sublineages that contain true reversions), or something between these extremes. I am open to suggestions.

jmcbroome commented 1 year ago

The main branch has been updated with a new filtering step that removes all branches with 2 or more reversions and their descendants from the input tree before proceeding to lineage calling. Still open to alternative approaches or adding additional flexibility to the filtering.