jeetsukumaran / DendroPy

A Python library for phylogenetic scripting, simulation, data processing and manipulation.
https://pypi.org/project/DendroPy/.
BSD 3-Clause "New" or "Revised" License
210 stars 61 forks source link

Duplicating trees while pruning #68

Closed wrightaprilm closed 8 years ago

wrightaprilm commented 8 years ago

Hi Jeet-

I encountered something that I wasn't sure was an expected behavior or not an expected behavior.

When pruning tips from a tree, I've noticed that

tree.prune_taxa(first_foss)

returns a tree but that

tree1 = tree.prune_taxa(first_foss)

does not assign a tree object to tree1.

If that's the expected behavior, cool. If not, I can write up a more detailed example with an example input file and put it somewhere for you to access.

jeetsukumaran commented 8 years ago

Hmmm. I don't think prune_taxa returns a tree. Did you mean some other function in the first line?

wrightaprilm commented 8 years ago

'Returns' was not the right way to phrase that. Looking at the source, it looks like prune_taxa just prunes the listed taxa on the tree object that's in memory. So, if I'm understanding this right, prune_taxa won't return anything, so trying to write the results of the command to a new object will give me nothing? So if I want to maintain a copy of the unpruned tree in memory, I should make a copy of the tree before pruning, and perform the pruning on the copy. Is that right?

jeetsukumaran commented 8 years ago

Python functions return None by default. So, your assignment state does bind tree to the implicit return value of the function, which is None.

For what you want to do, yes, making a copy of the unpruned tree before pruning will work. But note that Python has name binding semantics, and not memory-binding semantics like C++.

So if you say:

tree_copy = tree1 # NOT actually a copy!!
tree1.prune_taxa()
assert tree_copy is tree1 # True

you will not get anywhere, because tree_copy is a name bound to the same object as the tree1 name. Unlike C/C++, a value is NOT assigned to a memory slot, but one name is bound to the same object/reference to which the other name is bound. You will need to explicitly clone or deep-copy the tree:

tree_copy = tree1.clone()

and then go ahead and make changes to tree1.

Apologies if all of this is already familiar to you! But sometimes, especially if you switch back-and-forth between C/C++ and Python, it is easy to get tripped up by Python name-binding semantics vs. C/C++ memory-value assignment semantics.

Note that if you do not need/use any metadata annotations, you should consider using the extract_tree family of methods (https://pythonhosted.org/DendroPy/primer/treemanips.html#extracting-trees-and-subtrees-from-an-existing-tree):

Much faster and more efficient than cloning and then pruning.

wrightaprilm commented 8 years ago

OK, thanks! That answers my question.

I'll give the extract methods a try, since I'm not using metadata.