xflouris / libpll

Phylogenetic Likelihood Library
GNU Affero General Public License v3.0
26 stars 6 forks source link

Add node_id field to pll_utree_t/pll_rtree_t #98

Closed amkozlov closed 7 years ago

amkozlov commented 8 years ago

As discussed today, it would be very helpful to have an explicit node/utree identifier, independent from clv_index field (which is currently (mis)used as an implicit node identifier).

The sample code for initializing such unique node IDs is already available in @Pbdas repository: https://github.com/Pbdas/epa/blob/master/src/pll_util.cpp#L83

It uses the existing libpll convention, i.e. node IDs [0..tip_count-1] are reserved for the tips. Furthermore, three nodes of "triplet" (inner node) get adjacent node IDs, which allows to compute triplet ID easily (we can provide a helper function in libpll-modules for this).

So could we add this field and initialize it in pll_utree_parse_newick?

amkozlov commented 8 years ago

Implemented as described above: https://github.com/xflouris/libpll/commit/45e7aead36aabbd3c13175ad665562ffc69dbfb5

I called the field node_index for now, although subnode_index or just index might be a better name. Do we actually introduce the standardized terminology somewhere in documentation (e.g. how to call those three circularly-linked pll_utree_t structures in the inner nodes)?

lczech commented 8 years ago

To put my two cents in: Call the inner circlarly-linked thing a link. That way, a node consists of three links.

The terminology avoids confusion and makes documentation straight forward. Furthermore, if you then call the two links that connect a node an edge, you end up with three words of four letters each, which makes code indentation really nice (even works with tree!) ;-) That's at least my experience from doing exactly this in genesis.

PS: If you want to rename things, you might also want to consider calling the back-link the outer-link, again to make it clear what it actually does.