xflouris / libpll

Phylogenetic Likelihood Library
GNU Affero General Public License v3.0
26 stars 6 forks source link

pll_utree_unroot returns a virtual root that is inconsistent with the expected traversal order #149

Closed pierrebarbera closed 5 years ago

pierrebarbera commented 5 years ago

Currently, if you give pll_utree_unroot an rtree that has a non-leaf subtree on the left child (as defined in the newick string), it will set the virtual root to point to the utree_node that is toward the edge on which the rtree root would have been before:

    R
  /   \
 A     B

vroot = A , where A->back = B

However, when you execute a postorder traversal on a utree, the subtree that is traversed first is vroot->back (see https://github.com/BenoitMorel/libpll/blob/repeats/src/utree.c#L426) In this case, after unrooting, vroot->back would point to B, which is the right subtree from the newick perspective.

This means that a rooted tree ((A,B),(C,D)) after being read in and unrooted, would be traversed in the same order as the unrooted tree ((C,D),A,B), when presumably the user would expect the traversal order to be identical to a tree (A,B,(C,D)).

While traversal order should not matter in most scenarios, and the lack of proper ordering can be seen as a weakness of newick, it would be good to have some consistency here. It certainly makes maintaining order in this edge case a little easier.

Proposed fix: In the "left child isn't leaf" case of pll_utree_unroot, make the function return vroot->next instead of vroot (where vroot is the utree_node connecting subtree A to subtree B from the previous example). In the case where the left child of the rooted tree root is aleaf, no change is needed as this node's back pointer refers to the left subtree.

pierrebarbera commented 5 years ago

migrated to libpll-2 (https://github.com/xflouris/libpll-2/issues/6)