magikker / TreeHouse-Private

TreeHouse development.
GNU General Public License v3.0
0 stars 0 forks source link

Compute the k-tet similarity between two trees #35

Open magikker opened 11 years ago

magikker commented 11 years ago

If we think about a bipartition as defining the relationship between all taxa in a tree over an edge, and a quartet is defining the relationship of 4 taxa over an edge, we can think of a k-tet as defining the relationship between k-taxa over an edge.

There are existing bipartition based distance measures and quartet based distance measures.... but no one has a k-tet based distance measure. It'd take 2 trees and a k value, and return how many k-tets the trees had in common. If we know the number in common, and the total possible k-tets for the trees we can compute a distance with by subtracting total_possible - common.

These two trees have no bipartitions in common

((((((A,B),C),D),E),F),(G,H)); ((((((H,B),C),D),E),F),(G,A));

But a lot of similar internal structure. They've got 15 quartets in common. How many 5-tets, 6-tets? Measures like quartets and k-tets are more robust than bipartitions. That robustness comes at the cost of memory and speed. If we can do an arbitrary k-tet distance then we can control the trade off between robustness and mem/speed.