magikker / TreeHouse-Private

TreeHouse development.
GNU General Public License v3.0
0 stars 0 forks source link

Distance based Tree search #6

Open magikker opened 11 years ago

magikker commented 11 years ago

What: This would be a new search function which would take an input tree and return the tree(s) that are most similar to the input tree based on some similarity measure.

It would solve the problem of "What tree in the set is most similar to my input tree"

Title: similar_tree_search (could use a better name?)

Input/Output: It would take an input tree (newick string?). It would return the tree most similar to the input tree. Should probably print the %similarity to the screen and return the tree or trees to the command line. So if the input tree has 20 bipartitions and the closest tree you can find shares 10 biparititons, then you'd have a 50% match. All trees with that 50% match should probably be returned.

Implementation ideas:

Take an input tree, get it's bipartitions with dfs_compute_bitstrings(). Look for the trees that match with the most of the input tree's bipartition.

Possible Variations:

  1. Shared bipartitions, is just one measure of similarity. One could image allowing options for other measures such as quartets.
  2. Strictly matching bipartitions only makes sense when we're dealing with a taxa-homogenous tree set and search tree. How could this extend to heterogenous sets?
jaHoltz commented 11 years ago

Completed for shared bipartitions specifically