jeetsukumaran / DendroPy

A Python library for phylogenetic scripting, simulation, data processing and manipulation.
https://pypi.org/project/DendroPy/.
BSD 3-Clause "New" or "Revised" License
210 stars 61 forks source link

Creating new tree from seed node removes node from original tree #138

Closed Adam-Gleave closed 6 months ago

Adam-Gleave commented 3 years ago

Hey, thanks for your work on this library!

I'm running into an issue that I don't quite understand when creating subtrees. I'm not sure if it's a problem with the code or if I'm just taking the wrong approach.

I have one large tree as my dataset, and in my analysis I am finding certain nodes, and then creating another tree using their parent node as the root. I wanted to create a straight copy of the structure, without any filter functions, and with all nodes intact. The base dataset is huge, so I didn't want to use the extract_* methods to filter through all the nodes that way. The problem is that once I have done this, all nodes in the new tree seem to have disappeared from the original dataset. Here is the line that causes the issue:

node = tree.find_node_with_taxon_label(id)

while True:
    node = node.parent_node
    if len(node.child_nodes()) > 1:
        break

subtree = dendropy.Tree(seed_node=node)

Now, if I try and find a node in the original tree, that was within the bounds of that subtree, it fails:

# Same ID as before, returns `None`
node = tree.find_node_with_taxon_label(id)

This seems a bit strange, since semantically I would have assumed that Python would treat the seed node assignment by retaining a reference to the original node, rather than moving everything. I'm by no means a Python expert though, so I could well be wrong...

Deep copying the node works fine:

subtree = dendropy.Tree(seed_node=copy.deepcopy(node))

Am I just making the wrong assumption on the behavior of this method here, or is there something going wrong?

If it's just me, I'd be happy to make a PR with a method that performs a deep copy of the seed node. This unfortunately stumped me for an embarassingly long amount of time!

mmore500 commented 6 months ago

Thanks for the report! Just added a warning on any implicit splice operations. The deepcopy is a good workaround, calling dendropy.Tree(seed_node=node.extract_subtree()) should work too.