tjunier / newick_utils

shell tools for processing phylogenetic trees
Other
101 stars 31 forks source link

nw_stats feature request #12

Open alephreish opened 8 years ago

alephreish commented 8 years ago

Two trees ((1,2),3,(4,5)); and ((1,2),3,(4,5,6)); have exactly the same number of splits (2), yet nw_stats reports incorrect results for the tree with the multifurcation:

$ nw_stats - -f l <<< '((1,2),3,(4,5));'   | cut -f4
2
$ nw_stats - -f l <<< '((1,2),3,(4,5,6));' | cut -f4
1

This is a rather serious bug.

alephreish commented 8 years ago

Or even:

$ nw_stats - -f l <<< '((1,2,3),(4,5,6),(7,8,9));' | cut -f4
0
alephreish commented 8 years ago

Any comment?

josephwb commented 8 years ago

@har-wradim This returns the number of dichotomies (bifurcations), so the results are correct. By "splits" you mean internal nodes? That is not reported (but probably should as an additional value).

alephreish commented 8 years ago

OK, I see:

$ nw_stats - -f l <<< '(((1,2,3),(4,5,6)),(7,8,9));' | cut -f4
2

No by splits I mean splits = bipartitions (partitions of taxa on a tree).

My mistake stems from the fact that one would normally expect the number splits as one of the summary statistics for an unrooted tree, and not the number of dichotomies.

Let's convert this thread into a feature request.

josephwb commented 8 years ago

Right. They are intimately related (of course): the number of bipartitions is equal to num.internal.nodes-1 for unrooted trees and num.internal.nodes-2 for rooted trees. I agree this would be a useful property to return.