pylogeny / tree

Basic functionalities for handling phylogenetic trees, expanding the newick package
Apache License 2.0
0 stars 0 forks source link

major operations for tree manipulation #3

Open LinguList opened 3 years ago

LinguList commented 3 years ago

@PhyloStar will add specifics here, but we need something like:

These require that a node in the tree is placed somewhere else, so we need to be able to delete and insert parts of a tree (a node with its descendants).

LinguList commented 3 years ago

Since the newick class stores nodes by means of ancestor and descendant relations, most operations should be possible by simply modifying these relations. We have gained some experience with rooted trees, so we should stick with rooted trees so far and find simple solutions for these operations. That means also, that we define a branch by a node, and say that the edge defined by a node is from the ancestor of the node to the node.

We should also add a "reroot" function, as this makes it easier to avoid complex internal rerooting during nearest neighber exchange.

I have a first implementation ready, will share later.

LinguList commented 3 years ago

I learned one more thing about nearest-neighbor interchange now, when using rooted trees. It is not problematic at all, but we have three distinct scenarios:

  1. the root is the node we choose to select our nodes. In this case, the first bi-partition (binary trees only!) is root.descendants and the descendants thereof. If this does NOT yield a quartet, the root is not qualified for the operation.
  2. if the node is an internal node, we take the node as the first bipartition, and the second bipartition is the rest of the tree. In this case, we go up to the ancestor of the node. The descendant (minus the selected node itself) is the first part of the second bipartition. The second part of the second partition is the ancestor of the ancestor of the first node. If this IS the root, it is easy: we take the second descendant of the root as the second part of the second bipartition.
  3. But if this is NOT the case, we need to re-root the tree on the ancestor of the ancestor node and make the bipartition as in step 2.

I don't know how cost-intensive this routine is, and we may want to experiment with other ways (one can even use pure newick strick manipulation, I tested this before), but it would be what I understand as the rooted equivalent for nearest-neighbor interchange, and not that difficult to apply. As I have a first solution for re-rooting, I'd put a version of NNI along with this PR (probably next week?).

PhyloStar commented 3 years ago

We need to swap at the root, even if it is not a quartet. If one descendant of the root is a leaf, then, we can swap the leaf with one of the descendants of the leaf's sibling. We only need to check if a branch is not external.

Commented code from here: https://github.com/pylogeny/bayes/blob/master/src/cybayes/commands/mcmc_gamma.pyx#L235

shuffle_keys = list(temp_edges_list.keys()) # postorder list of edges labeled as a tuple of parent and child
random.shuffle(shuffle_keys) # shuffle the list of branches. We can select a node randomly.
for a, b in shuffle_keys:
    if b > config.N_TAXA:# and a != root_node: #select a branch if it not connected to a leaf. Only internal branch.
        x, y = nodes_dict[a], nodes_dict[b] #get the descendants of the nodes
        break

if x[0] == b: src = x[1] # select the sibling of the branch where NNI is performed.
else: src = x[0]
tgt = random.choice(y) #select the sibling's children
src_bl, tgt_bl = temp_edges_list[a, src], temp_edges_list[b, tgt] # update the branch lengths
del temp_edges_list[a,src], temp_edges_list[b, tgt] # delete older branches
temp_edges_list[a, tgt] = tgt_bl # update the branch lengths
temp_edges_list[b, src] = src_bl # update the branch lengths

How to update the branch lengths in pylotree format is the question. You know better, for sure.

PhyloStar commented 3 years ago

If we go for ultrametric trees, then, we cannot swap at the root if it is not a quartet. We need to think if we want to do ultrametric trees.

LinguList commented 3 years ago

I already looked how to do it. You need to modify the ancestor and descendant relations, that's all.

LinguList commented 3 years ago

Here's an example: my function for tree rerooting, for which I'll prepare the PR later (but am still thinking if we should make this a tree-internal function of the tree class, like: Tree.root_ad(node) returning a new tree.

def get_i(nodeA, nodeB):
    return [i for i in range(2) if nodeA.descendants[i].name ==
            nodeB.name][0] 

def root_at(otree, node_name):
    """
    Root the tree at a specific node.
    """

    # TODO: Edge3 does not work yet!
    tree = Tree(otree.newick)
    node = tree[node_name]
    root = tree.root
    ancestor = node.ancestor
    if ancestor.name == root.name:
        return otree

    # split tree in half
    partA = [d for d in root.descendants if [x for x in d.get_leaf_names() if x
        in node.get_leaf_names()]][0]
    partB = [d for d in root.descendants if not [x for x in d.get_leaf_names() if x
        in node.get_leaf_names()]][0]
    if partA.name != ancestor.name:
        partA.ancestor = None

    partB.ancestor = None

    # assign new data the new root
    root.descendants = [node, ancestor]
    queue = [(ancestor, ancestor.ancestor)]
    aa = ancestor.ancestor
    ancestor.descendants[get_i(ancestor, node)] = aa #ancestor.ancestor
    node.ancestor, ancestor.ancestor = root, root

    while queue:
        nn, aa = queue.pop(0)
        if aa.ancestor:
            aa.descendants[get_i(aa, nn)] = aa.ancestor
            queue += [(aa, aa.ancestor)]
            aa.ancestor = nn
        else:
            aa.descendants[get_i(aa, nn)] = partB
            partB.ancestor = aa
            aa.ancestor = nn
    return tree