niemasd / TreeSwift

TreeSwift: Fast tree module for Python 3
https://niema.net/TreeSwift
GNU General Public License v3.0
75 stars 14 forks source link

Preorder traversal order #21

Closed theosanderson closed 2 years ago

theosanderson commented 2 years ago

Thank you for this package!

I've been having some issues with pre-order traversal. I think it's possible that nodes appear in the wrong order, or at least a different order to Dendropy.

import treeswift
import dendropy

newick_string = """
(mammal:0.14,(turtle:0.02,(rayfinfish:0.25,(frog:0.01,salamander:0.01):0.12):0.09):0.03);

        """
ts_tree = treeswift.read_tree_newick(newick_string)

dp_tree = dendropy.Tree.get(data=newick_string, schema="newick")

ts_tree_iter = ts_tree.traverse_preorder()
dp_tree_iter = dp_tree.preorder_node_iter()

for x in range(1000):
    ts_node = next(ts_tree_iter)
    dp_node = next(dp_tree_iter)

    print("TS:", ts_node.label, "DP:", dp_node.taxon)
TS: None DP: None
TS: None DP: 'mammal'
TS: None DP: None
TS: None DP: 'turtle'
TS: salamander DP: None
TS: frog DP: 'rayfinfish'
TS: rayfinfish DP: None
TS: turtle DP: 'frog'
TS: mammal DP: 'salamander'

Let me know if I have this wrong. Thanks again!

niemasd commented 2 years ago

Thanks for reaching out! This behavior is expected: the only guarantee of a preorder traversal is that you will visit a parent node before you visit its children, but beyond that restriction, nodes can be visited in any order. Thus, there's no guarantee that the nodes will be visited in the same order across tools (or even the same tree in the same tool, but with children rotated)

theosanderson commented 2 years ago

Thank you, sorry for the noise!

theosanderson commented 2 years ago

(sorry fat finger hit reopen and posted a half-written comment)

For info some sources do suggest that pre-order is a defined ordering https://en.m.wikipedia.org/wiki/Tree_traversal, mentions pre-order and reverse pre-order (right before left). It could be worth clarifying in the docs that this isn't this type of defined pre-order. (But this may well be context-dependent and I'm new to tree-stuff!) Thanks again for explaining this.

niemasd commented 2 years ago

Thanks for sharing! In fully bifurcating trees, because every internal node has exactly 2 children, the only two ways of defining a preorder would be Visit-Left-Right or Visit-Right-Left, hence the ability to name them. However, this does not generalize to multifurcating trees (which TreeSwift assumes support for) nor does it really apply to phylogenies (because "left" and "right" don't really have meaning, as child order in a phylogeny is arbitrary)

If you're interested in tree algorithms (or data structures in general), be sure to check out my textbook 😄 It covers the various simple tree traversal algorithms but also discusses quite a few other data structures and their algorithms

https://stepik.org/course/Data-Structures-579

theosanderson commented 2 years ago

Haha, thanks, and sorry