Tree shape verification

rvosa commented 8 years ago

We need to come up with a general way to verify whether the trees the @dimbots is producing are more or less congruent with what we know about the systematics of the respective taxonomic groups. This probably means that we need to do at least the following:

collect benchmark trees to which we can compare our results. @tobiashofmann88 has started to do this as of commit 57f1b7072e78ceb6ddbac04867c4666a86f27cf5 but we need more. For the current milestone we at least need a tree for the monocots, and once we scale up we are going to need even more. Potentially we can get away with using the OTOL tree, although this is so polytomous that it is not a very convincing source of information (at present).
taxonomically reconcile the benchmark trees with our own results. Since our taxon names are the canonical names from the NCBI taxonomy the simplest approach is to simply rename the taxa in the benchmark trees also to these canonical names so that we can subsequently do simple string comparisons.
build a little workflow that for each of our final_pruned.nex trees i) prunes the benchmark tree and our tree to the same taxon set. This at least means that many of the benchmark taxa are pruned, but possibly also taxa from our own trees if they're not in the benchmark. ii) does a topological comparison, e.g. using RF distance.

I propose the following division of labour:

@dimbots makes sure that all final trees have consistent names, i.e. #12, #13, #14, #15, #18, #24
@dimbots looks for a useable benchmark tree for the monocots
@hettling worries about whether the benchmark trees are consistently named and TNRS'ed
@rvosa writes the verification workflow

rvosa commented 8 years ago

Here's another big mammal tree: http://www.nature.com/nature/journal/v446/n7135/full/nature05634.html

rvosa commented 8 years ago

For the monocots, and everything else, maybe we can use this: http://www.opentreeoflife.org/

rvosa commented 8 years ago

Question: how do we quantify this? @hettling suggests:

show side by side in a tanglegram, like in the SUPERSMART paper
using some metrics

hettling commented 8 years ago

TNRS'ed versions of the Casey Dunn and Faurby benchmark trees have been added with commit cc8c47ab08c509664b0795487bac2f38856d7ac8. Species that could not be mapped were removed from the trees (about 1/5th in Faurby's tree and much less in Casey-Dunn's). This was done with the new -s option in smrt-utils idmap.

hettling commented 8 years ago

Trees can be compared and automatically pruned to the same taxon set using smrt-utils treedist (offers euclidean branch length diff and robinson-foulds diff). What is still missing is a wrapper for ape's cophyloplot to plot the trees side-by-side.

rvosa commented 7 years ago

I think that we can now close this, given the explosion of detailed remarks generated by @hettling towards this milestone: https://github.com/naturalis/RiseAndFall/milestone/2

naturalis / RiseAndFall

Tree shape verification #27