niemasd / TreeCluster

Efficient phylogenetic clustering of viral sequences
GNU General Public License v3.0
76 stars 10 forks source link

TreeCluster : is it only for binary trees ? #2

Closed ghsama closed 4 years ago

ghsama commented 4 years ago

Hey, I was wondering if the algorithm is only usable for binary trees, and if so is there a possibility to generalize it to all kind of trees (or at least rooted ones). Thanks

niemasd commented 4 years ago

It should work on arbitrary trees (not necessarily binary)

The methods that don't say "Clade" should work on both rooted and unrooted trees if I'm not mistaken

EDIT: Fixed typo ("Clare" --> "Clade")

niemasd commented 4 years ago

I'm going to close this issue because it seems like the question is answered, but please feel free to follow up (either on this Issue or in a new one) and I'll be happy to help :-)

ghsama commented 4 years ago

Hey, thank you for you response.But in the original article (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0221068) it states in the problem definition Let T = (V, E) be an unrooted binary tree represented by an undirected acyclic graph with vertices V (each with degree one or three), weighted edges E, and leafset . andd also in the algorithm 1, it's based on the fact that there is 'left chld' and 'right child'. If you came up to have more g eneral algo which is the implemented, could you please give me a reference. thank you

niemasd commented 4 years ago

Given a tree with polytomies, we can simply resolve all polytomies arbitrarily with 0-length branches to yield a binary tree that is equivalent with respect to the clustering definitions we provide. For example, imagine the following tree:

(A:1,B:2,C:3);

--- A
|
------ B
|
--------- C

--- = length of 1

If we arbitrarily resolve the polytomy with 0-length branches, we could get the following binary tree:

((A:1,B:2):0,C:3);

  --- A
  |
|-
| |
| ------ B
|
--------- C

--- = length of 1
-   = length of 0

As can be seen, the addition of this 0-length branch does not impact pairwise distances, root-to-tip distances, maximum branch length, sum of branch lengths, etc.

niemasd commented 4 years ago

I'll close this issue because it seems like the question is answered, but please feel free to follow up (either on this Issue or in a new one) and I'll be happy to help!

TL;DR: Given a non-binary tree, TreeCluster resolves polytomies arbitrarily with 0-length branches, which does not impact any of the clustering methods implemented in TreeCluster