Calculating best parsimony score in DAG

In addition to calculating the minimum parsimony score achieved by any tree in the DAG, we will want to also know the minimum parsimony score achieved by a subtree below each node in the DAG (since this will be useful for trimming).

The parsimony score of a tree is a sum of the number of mutations on all the edges of the tree. We can also imagine different kinds of tree weights which are also a sum over edges, of the value of a function which takes an edge and returns some sort of weight type that we know how to add.

In order to easily accommodate tree weights other than parsimony, I propose we write a general method for computing the minimum weight of a tree in the DAG. Let $f$ be the edge weight function, and let's imagine that we may also want to change what we mean by $min$ and $+$, depending on what the weight type is, and what we want to do with it. (For instance, even if $f$ returns a float, we might want to replace $\min$ with $max$, or use $\times$ instead of $+$)

Then we can compute the minimum weight $M$ of a subtree below each DAG node $v$ via dynamic programming as follows:

If $v$ is a leaf node, then $M(v) = 0$, or in general $M(v)$ should be the identity of whatever binary operator we're using instead of $+$. The function that computes the value of $M$ on leaf nodes could just be provided as a function argument

Otherwise, for a node $v$ with $k$ clades, and $C_i$ the set of child nodes below clade $i$, the minimum weight of a subtree below $v$ is:

$$M(v) = \sum{i=1}^k \min{v'\in C_i} \left (M(v') + f(v, v') \right )$$

It will be useful to cache the value of $M$ for use by the trimming function.

matsengrp / larch

Calculating best parsimony score in DAG #13