Closed matsen closed 13 years ago
OK, my brain finally woke up on this one, thanks to @breadbaron.
gtrees should not be speaking explicitly in terms of "boot", but just in terms of edge labels and node labels. That is all they are. We already have the node labels in terms of Newick gtree's "name".
We should rename boot
to edge_label
and name
to node_label
.
Then, when we are parsing a gtree, there should be a number_via_edge_label
flag (off by default).
When that is off, the edge labels go into the edge_label
field and node labels go into the node_label
field.
When it is on, we use those edge labels as node numbers for the stree.
For example, for (a[3],b[5])c[7]
, a, b, and c are node labels, and 3, 5, and 7 are edge labels.
Now, when we are writing out a place file, we are using the edge labels for the node numbers, so we have to have them as a separate field. The node labels should go out as boot is now.
For phyloxml, we will need something that determines which labeling should be considered as confidence. It would seem fine to me to do the following, say for edge_labels:
if so, then label them like
[Myxml.tag "confidence" ~attributes:[("type", "edge_label")](Printf.sprintf "%g" boot)]) boot
and same for node_label
.
From an e-mail conversation:
My thought was to have these be parallel to the edge labels, and let the edge labels and the node labels be arbitrary strings.
((a:.01{2}[happy], b:.01{3}[24p])a_b:.01{1}[666], c:.01{4}[3.14])root{0}[0];
or without edge labels,
((a:.01{2}, b:.01{3})a_b:.01{1}, c:.01{4})root{0};
or without edge labels or node labels (except for leaves),
((a:.01{2}, b:.01{3}):.01{1}, c:.01{4}){0};
Last thing to do concerns the interpretation of the branch labels and the node labels as being passed to PhyloXML. Here are the rules that will cover 99% of cases, I think, to determine what gets turned into the "confidence" value in newick_bark.ml:
For the time being, we can ignore the other (that is not interpreted as confidence) when converting to phyloXML.
It can be labeled "confidence" rather than "bootstrap".
@breadbaron, will this work for you?
Yep, it seems to work, thanks!
Oscar
On Fri, Oct 14, 2011 at 6:20 AM, Erick Matsen < reply@reply.github.com>wrote:
Last thing to do concerns the interpretation of the branch labels and the node labels as being passed to PhyloXML. Here are the rules that will cover 99% of cases, I think, to determine what gets turned into the "confidence" value in newick_bark.ml:
- If, between the edge labels and the node labels, there is only one that is given and numeric, then use that as the confidence value
- Otherwise, default to using the edge labels.
For the time being, we can ignore the other (that is not interpreted as confidence) when converting to phyloXML.
It can be labeled "confidence" rather than "bootstrap".
@breadbaron, will this work for you?
Reply to this email directly or view it on GitHub: https://github.com/matsen/pplacer/issues/149#issuecomment-2406459
We're almost there, but when making a phyloXML tree, if we use a node label as a "confidence", it shouldn't be also used as a "name".
And then be stored in a "confidence" vector or map in the placerun, with keys being the node labels.