jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
4 stars 3 forks source link

Classifying reversion mutations #94

Open hyanwong opened 1 year ago

hyanwong commented 1 year ago

We thought it would be a good idea to get a handle on revertants (i.e. mutations whose derived state is equivalent to the state of their parent mutation). We think these are mostly erroneous, so it would be good to identify what's going wrong. We could classify them by the nodes they are above (e.g are they above a recombinant, or a reversion push)

jeromekelleher commented 1 year ago

We have code for classifying mutations as reversions here so I think we want two things:

  1. A plot to summarise the path length and time back to the thing its reverting (i.e., either the parent mutation, or the root. All mutations have a time in sc2ts). Both are useful, as we'd like to know if the majority of reversions revert a mutation one node above the immediate parent, e.g. So, I guess a method giving a histogram of path length and time difference would be a useful thing to add to the TreeInfo
  2. A plot to classify the types of nodes that these reversions happen on. A simple first thing to do would be to do a bar chart with the categorial axis is the node flags. This would also be a useful thing to add to the TreeInfo

I think it's handy to add these QC-type functions to the notebooks/qc-template as they are done, so that we have something that you can run on a new set of trees to get an overall view of the quality.