I'm running into an issue that I don't quite understand when creating subtrees. I'm not sure if it's a problem with the code or if I'm just taking the wrong approach.
I have one large tree as my dataset, and in my analysis I am finding certain nodes, and then creating another tree using their parent node as the root. I wanted to create a straight copy of the structure, without any filter functions, and with all nodes intact. The base dataset is huge, so I didn't want to use the extract_* methods to filter through all the nodes that way. The problem is that once I have done this, all nodes in the new tree seem to have disappeared from the original dataset. Here is the line that causes the issue:
node = tree.find_node_with_taxon_label(id)
while True:
node = node.parent_node
if len(node.child_nodes()) > 1:
break
subtree = dendropy.Tree(seed_node=node)
Now, if I try and find a node in the original tree, that was within the bounds of that subtree, it fails:
# Same ID as before, returns `None`
node = tree.find_node_with_taxon_label(id)
This seems a bit strange, since semantically I would have assumed that Python would treat the seed node assignment by retaining a reference to the original node, rather than moving everything. I'm by no means a Python expert though, so I could well be wrong...
Am I just making the wrong assumption on the behavior of this method here, or is there something going wrong?
If it's just me, I'd be happy to make a PR with a method that performs a deep copy of the seed node. This unfortunately stumped me for an embarassingly long amount of time!
Thanks for the report! Just added a warning on any implicit splice operations. The deepcopy is a good workaround, calling dendropy.Tree(seed_node=node.extract_subtree()) should work too.
Hey, thanks for your work on this library!
I'm running into an issue that I don't quite understand when creating subtrees. I'm not sure if it's a problem with the code or if I'm just taking the wrong approach.
I have one large tree as my dataset, and in my analysis I am finding certain nodes, and then creating another tree using their parent node as the root. I wanted to create a straight copy of the structure, without any filter functions, and with all nodes intact. The base dataset is huge, so I didn't want to use the
extract_*
methods to filter through all the nodes that way. The problem is that once I have done this, all nodes in the new tree seem to have disappeared from the original dataset. Here is the line that causes the issue:Now, if I try and find a node in the original tree, that was within the bounds of that subtree, it fails:
This seems a bit strange, since semantically I would have assumed that Python would treat the seed node assignment by retaining a reference to the original node, rather than moving everything. I'm by no means a Python expert though, so I could well be wrong...
Deep copying the node works fine:
Am I just making the wrong assumption on the behavior of this method here, or is there something going wrong?
If it's just me, I'd be happy to make a PR with a method that performs a deep copy of the seed node. This unfortunately stumped me for an embarassingly long amount of time!