Closed mmore500 closed 5 months ago
Thanks for bringing this to our attention and this fix!
I appreciate the issue and the fix makes sense. I'll look into this in a bit, and please don't hesitate to send me a reminder if I dawdle.
One question I have is whether or not the '[&U]' is cosmetic or is the tree.is_rooted
property set. If the latter, is it only changed if it is None
, or is it otherwise unconditionally set?
Maybe not related: does it make sense to ask for an MRCA of an unrooted tree? Should we throw an error?
One question I have is whether or not the '[&U]' is cosmetic or is the
tree.is_rooted
property set. If the latter, is it only changed if it isNone
, or is it otherwise unconditionally set?
I did an additional experiment, and it looks like the rootedness is only changed to True
from the None
state. Here, I set the rootedness explicitly to False
and it remained False
after the mrca operation.
>>> import dendropy as dp
>>> tree = dp.Tree.get(
... data="((10,(((5,(11)),7,9))),((((1)))));",
... schema="newick",
... )
>>> leaf_taxa = [
... leaf.taxon for leaf in tree.leaf_node_iter()
... ]
>>> print("first tree.is_rooted", tree.is_rooted)
first tree.is_rooted None
>>> tree.is_rooted = False
>>> print("second tree.is_rooted", tree.is_rooted)
second tree.is_rooted False
>>> tree.mrca(taxa=leaf_taxa[:2])
<Node object at 0x7f7e81d0cd60: 'None' (None)>
>>> print("after tree.is_rooted", tree.is_rooted)
after tree.is_rooted False
Maybe not related: does it make sense to ask for an MRCA of an unrooted tree? Should we throw an error?
Not sure :sweat_smile: I was kind of wondering that too
It looks like the existing behavior is to use seed_node
implicitly as the root, even when is_rooted
is False
.
Calculating the MRCA between leaves 10 and 11 on this tree gives the root node R
as the result.
>>> import dendropy as dp
>>> tree = dp.Tree.get(
... data="(10,(((5,(11)),7,9)),((((1)))))R;",
... schema="newick",
... )
>>> leaf_taxa = [
... leaf.taxon for leaf in tree.leaf_node_iter()
... if leaf.taxon.label in ("10", "11")
... ]
>>> print(leaf_taxa)
[<Taxon 0x7efda9708ca0 '10'>, <Taxon 0x7efda9708f10 '11'>]
>>> print("first tree.is_rooted", tree.is_rooted)
first tree.is_rooted None
>>> tree.is_rooted = False
>>> print("second tree.is_rooted", tree.is_rooted)
second tree.is_rooted False
>>> print(tree.mrca(taxa=leaf_taxa[:2]))
<Node object at 0x7efda9708d90: 'R' (None)>
There could plausibly be some kind of argument to calculate the MRCA between two nodes in an unrooted tree by taking the shortest path between them (would be well-defined/unique) and selecting the node closest to the middle of this path. How the MRCA between more than two nodes might be interpreted in an unrooted tree isn't immediately obvious to me Plus, for either of these cases, I doubt there's a substantial intentional use case for MRCA in an unrooted tree anyways.
In the scenario I ran into, I was basically relying on the ambiguity of the None
rootedness state to prevent introduction of the [&R]
or [&U]
markers into newick string output. This can certainly be accomplished in other ways.
Warning or throwing an error for the False
and None
could be a reasonable choice, especially from a strict/literal interpretation of MRCA.
I did a quick sanity check to see if BioPython had made any kind of decision here, and it looks like they're also just using the implicit root of an unrooted tree without a warning or error.
>>> from io import StringIO
>>> from Bio import Phylo
>>> tree = Phylo.read(StringIO("(10,(((5,(11)),7,9)),((((1)))))R;"), "newick")
>>> tree.rooted
False
>>> leaf1 = tree.get_terminals()[0]
>>> leaf1
Clade(name='10')
>>> leaf2 = tree.get_terminals()[1]
>>> leaf2
Clade(name='5')
>>> mrca = tree.common_ancestor(leaf1, leaf2)
>>> mrca
Clade(name='R')
Ok, let's go ahead and merge this. Maybe a warning would be good thing to add though
Here is a minimum working example:
Works on multiple versions of Python, including
Python 3.9.16 (main, Dec 7 2022, 01:12:08) [GCC 11.3.0] on linux
withdendropy 4.5.2
.In a more copy/pasteable form:
This side effect is can have very unexpected and perplexing consequences. Consider, for example
(Note the addition of the
[&U]
prefix after callingmrca
.)as far as I can tell what is happening is
Tree.mrca
Tree.encode_bipartitions
suppress_unifurcations=False
collapse_unrooted_basal_bifurcation=True
,suppress_storage=False
, andis_bipartitions_mutable=False
left as defaultTree.collapse_basal_bifurcation
set_as_unrooted_tree=True
left as defaultis_rooted = False