matsengrp / larch

Inference and manipulation of history DAGs
MIT License
2 stars 2 forks source link

Calculating best parsimony score in DAG #13

Closed willdumm closed 2 years ago

willdumm commented 2 years ago

In addition to calculating the minimum parsimony score achieved by any tree in the DAG, we will want to also know the minimum parsimony score achieved by a subtree below each node in the DAG (since this will be useful for trimming).

The parsimony score of a tree is a sum of the number of mutations on all the edges of the tree. We can also imagine different kinds of tree weights which are also a sum over edges, of the value of a function which takes an edge and returns some sort of weight type that we know how to add.

In order to easily accommodate tree weights other than parsimony, I propose we write a general method for computing the minimum weight of a tree in the DAG. Let $f$ be the edge weight function, and let's imagine that we may also want to change what we mean by $min$ and $+$, depending on what the weight type is, and what we want to do with it. (For instance, even if $f$ returns a float, we might want to replace $\min$ with $max$, or use $\times$ instead of $+$)

Then we can compute the minimum weight $M$ of a subtree below each DAG node $v$ via dynamic programming as follows:

If $v$ is a leaf node, then $M(v) = 0$, or in general $M(v)$ should be the identity of whatever binary operator we're using instead of $+$. The function that computes the value of $M$ on leaf nodes could just be provided as a function argument

Otherwise, for a node $v$ with $k$ clades, and $C_i$ the set of child nodes below clade $i$, the minimum weight of a subtree below $v$ is:

$$M(v) = \sum{i=1}^k \min{v'\in C_i} \left (M(v') + f(v, v') \right )$$

It will be useful to cache the value of $M$ for use by the trimming function.