tjunier / newick_utils

shell tools for processing phylogenetic trees
Other
104 stars 31 forks source link

Rootedness in nw_condense #16

Open alephreish opened 8 years ago

alephreish commented 8 years ago

Take the following unrooted tree:

$ cat test.nwk 
(((A,A),(C,D)),E,E);

I expected nw_condense to condense both the AA and the EE, but the polytomy seems to be an obstacle:

$ nw_condense test.nwk | nw_display - -w 20
      /----------+ A
      |             
 /----+     /----+ C
 |    \-----+       
 |          \----+ D
=+                  
 +---------------+ E
 |                  
 \---------------+ E

$ nw_reroot test.nwk A | nw_condense - | nw_display - -w 20
 /---------------+ A
 |                  
=+   /-----------+ A
 |   |              
 \---+       /---+ C
     |   /---+      
     \---+   \---+ D
         |          
         \-------+ E

Since the tree is unrooted the groups AA and EE are symmetric (the tree contains both splits AA|BCDEE and EE|AABCD) and should be treated equally.

alephreish commented 8 years ago

The practical problem at hand is collapsing in-paralogs (paralogs which arose after the last considered speciation event) in trees constructed for some 10000 orthologous groups for the purpose of gene tree/species tree reconciliation. The raw gene trees are unrooted and the order of the leaves is unpredictable.

My current workaround is iterating the leaves for each individual tree in order to search for the one which would allow collapsing all in-paralogs. This solution nevertheless fails on the rare occasions of trees with each one of the species having in-paralogs.