tskit-dev / tsinfer

Infer a tree sequence from genetic variation data.
GNU General Public License v3.0
56 stars 13 forks source link

Use parsimony algorithm to compress likelihoods #374

Open jeromekelleher opened 3 years ago

jeromekelleher commented 3 years ago

The current approach to compressing likelihoods on the tree is a simple scheme based on merging values that are the same as their parents. This will not result in the smallest posslble number of state transitions on the tree. To do this we need to use the Hartigan parsimony algorithm like we do in the tskit haplotype matching code.

This "should" have a significant impact on performance, and make the precision parameter more meaningful, but we'll have to see. It's not a trivial change, as the code is highly optimised for the current method.

hyanwong commented 2 years ago

Also see https://github.com/tskit-dev/tskit/issues/1040 for discussion of parsimonious placements of likelihood in tskit