matsengrp / gctree

GCtree: phylogenetic inference of genotype-collapsed trees
https://matsengrp.github.io/gctree
GNU General Public License v3.0
16 stars 2 forks source link

Allow all ranking stats to be computed on individual `CollapsedTree`s #109

Open willdumm opened 2 years ago

willdumm commented 2 years ago

Since ctrees are ete trees, and e.g. mutability parsimony is only implemented for history DAGs, it's difficult to match ranking stats to individual collapsed trees extracted from a parsimony forest.

There are two options for fixing this. The elegant but inefficient way (which is also not backwards-compatible with older pickled trees) is to store the original history on each ctree object, so that optimal_weight_annotate kwargs may be used to compute any stats of interest.

The more practical way would be to implement a ctree method to compute each ranking stat, directly from the ete tree.

willdumm commented 2 years ago

For example, here's how this can be done for mutability parsimony

import gctree.mutation_model as mm
mut_model = mm.MutationModel(mutability_file='path_to_mutability_file', substitution_file='path_to_substitution_file')
mutability_distance = mm._mutability_distance(mut_model, splits=splits)

def mutability_parsimony(ctree):
    return sum(mutability_distance(n.up.sequence, n.sequence) for n in ctree.tree.iter_descendants())

for ctree in forest:
    print(mutability_parsimony(ctree))

Where splits is a list containing indices where sequences are concatenated.