matsen / pplacer

Phylogenetic placement and downstream analysis
http://matsen.fredhutch.org/pplacer/
GNU General Public License v3.0
75 stars 18 forks source link

branch labels and node labels should be saved #149

Closed matsen closed 13 years ago

matsen commented 13 years ago

And then be stored in a "confidence" vector or map in the placerun, with keys being the node labels.

matsen commented 13 years ago

OK, my brain finally woke up on this one, thanks to @breadbaron.

gtrees should not be speaking explicitly in terms of "boot", but just in terms of edge labels and node labels. That is all they are. We already have the node labels in terms of Newick gtree's "name".

We should rename boot to edge_label and name to node_label.

Then, when we are parsing a gtree, there should be a number_via_edge_label flag (off by default). When that is off, the edge labels go into the edge_label field and node labels go into the node_label field. When it is on, we use those edge labels as node numbers for the stree.

For example, for (a[3],b[5])c[7], a, b, and c are node labels, and 3, 5, and 7 are edge labels.

Now, when we are writing out a place file, we are using the edge labels for the node numbers, so we have to have them as a separate field. The node labels should go out as boot is now.

For phyloxml, we will need something that determines which labeling should be considered as confidence. It would seem fine to me to do the following, say for edge_labels:

and same for node_label.

habnabit commented 13 years ago

From an e-mail conversation:

My thought was to have these be parallel to the edge labels, and let the edge labels and the node labels be arbitrary strings.

((a:.01{2}[happy], b:.01{3}[24p])a_b:.01{1}[666], c:.01{4}[3.14])root{0}[0];

or without edge labels,

((a:.01{2}, b:.01{3})a_b:.01{1}, c:.01{4})root{0};

or without edge labels or node labels (except for leaves),

((a:.01{2}, b:.01{3}):.01{1}, c:.01{4}){0};

matsen commented 13 years ago

Last thing to do concerns the interpretation of the branch labels and the node labels as being passed to PhyloXML. Here are the rules that will cover 99% of cases, I think, to determine what gets turned into the "confidence" value in newick_bark.ml:

For the time being, we can ignore the other (that is not interpreted as confidence) when converting to phyloXML.

It can be labeled "confidence" rather than "bootstrap".

@breadbaron, will this work for you?

gh-owestesson commented 13 years ago

Yep, it seems to work, thanks!

Oscar

On Fri, Oct 14, 2011 at 6:20 AM, Erick Matsen < reply@reply.github.com>wrote:

Last thing to do concerns the interpretation of the branch labels and the node labels as being passed to PhyloXML. Here are the rules that will cover 99% of cases, I think, to determine what gets turned into the "confidence" value in newick_bark.ml:

  • If, between the edge labels and the node labels, there is only one that is given and numeric, then use that as the confidence value
  • Otherwise, default to using the edge labels.

For the time being, we can ignore the other (that is not interpreted as confidence) when converting to phyloXML.

It can be labeled "confidence" rather than "bootstrap".

@breadbaron, will this work for you?

Reply to this email directly or view it on GitHub: https://github.com/matsen/pplacer/issues/149#issuecomment-2406459

matsen commented 13 years ago

We're almost there, but when making a phyloXML tree, if we use a node label as a "confidence", it shouldn't be also used as a "name".