matsengrp / larch

Inference and manipulation of history DAGs
2 stars 2 forks source link

weight enumeration instead of strictly optimal weight calculation #84

Closed marybarker closed 2 months ago

marybarker commented 3 months ago

Currently the Subtreeweight impls for parsimony and rf distance calculate the optimal weight below each node and iteratively traverse the tree that way. Each node calculates the optimal weight obtained over each clade, and accumulates those optimal weights together and stores it at that node. This is fairly simple, and makes it so that we can efficiently find the optimal RF/parsimony score.

However, it would be nice if we could extend this to get the range of values with their multiplicity. So instead of parsimony producing a single number, we'd have something like:

{score1: num_score1, ... scoreN: num_scoreN}

That is, a list of all the parsimony scores of all the trees in the DAG together with their multiplicity(and same thing for RF distance). Instead of saving the optimal weight at each node, it would instead save the ranges of weights based on the choices of node/clade children below each node, and then accumulate those, similarly to the optimality version.

So for example, if node X has 2 clades A and B, and the children below A have possible weights {A1, ..., AN}, and the children below B have {B1, ..., BM} respectively, then the weights accumulated to X would be an N x M-long vector of weights {A1*B1, ...., AN*BM}.

This could be implemented as an alternative subtree weight, or as an extension of existing ones.