Closed pekarj closed 7 months ago
Can you try running extract_tree_without
with suppress_unifurcations=False
? I have suppress_unifurcations=True
as the default behavior because that's what most folks expect when they extract the pruned tree, but in this specific case, I think setting it to False
to avoid suppressing unifurcations will preserve the mutations you're interested in. Specifically, given the file you attached, I ran the following:
tree = treeswift.read_tree_nexus(tree_path)['tree1']
exclude = ['L4']
# note the `suppress_unifurcations=False` argument I added
tree_pruned = tree.extract_tree_without(exclude, suppress_unifurcations=False)
print("=== ORIG ===")
for n in tree.traverse_internal():
if 'node_params' in dir(n):
print(n, n.node_params)
print()
print("=== PRUNE ===")
for n in tree_pruned.traverse_internal():
if 'node_params' in dir(n):
print(n, n.node_params)
and it prints the following (which I think should be correct based on your description):
=== ORIG ===
I4 &mutations="M1"
I5 &mutations="M2,M3,M4"
I6 &mutations="M5,M6,M7"
I10 &mutations="M8"
I11 &mutations="M10"
I12 &mutations="M11"
I13 &mutations="M17,M18,M19"
I14 &mutations="M20"
=== PRUNE ===
I4 &mutations="M1"
I5 &mutations="M2,M3,M4"
I6 &mutations="M5,M6,M7"
I10 &mutations="M8"
I11 &mutations="M10"
I12 &mutations="M11"
I13 &mutations="M17,M18,M19"
I14 &mutations="M20"
If this is indeed the case (that setting suppress_unifurcations=False
produces the desired results), I can try adding a warning if suppress_unifurcations
would result in the deletion of an internal node that has non-empty attributes to help programmers catch when this is happening. Thoughts?
Yes, that fixes the issue. Thanks! I think adding the warning is fine for now (and maybe pointing out which node(s) with attributes was deleted); I'm not sure if this will be a frequent issue for people. In my case, I wanted to prune from 20 leaves to 3 and realized I needed to pick the "correct" 3 leaves to keep the mutations.
The easiest fix might be just traversing up until you get to a node with an edge length > 0 and then assigning the attributes to that node, but that would probably have its own concerns.
Glad to hear that fixed the immediate issue!
Unfortunately, while your described approach might work for some scenarios, I think it might cause problems in others. I personally think keeping the unification in the tree would be the most appropriate if there's information in the internal nodes that you want to preserve, as you're otherwise losing information about when the mutations occurred, but others might not, so leaving this as something folks can implement on their own however works best for their specific problem is probably the right approach for the library as a whole
I'll add a warning if nodes with non-trivial info would be deleted (and add an argument to suppress the warning if desired), so hopefully that'll help avoid silently deleting critical info!
That sounds good. Thanks, Niema!
Warning added in TreeSwift v1.1.40 (commit https://github.com/niemasd/TreeSwift/commit/94a986231bb6f39fafe4cab1c39df2c54f6b2ece). Thanks, Jonathan!
Hey Niema, thanks for the continued updating of this software.
If I have a tree with branch X with descendant leaves A, B, and C and several mutations on branch X, and then I extract a subtree with just one of those three leaves, I'll lose the mutations corresponding to that branch because they would have been part of the
node_params
of an internal node. This will also happen if I extract with two of the three leaves but don't include the leaf descending from that internal node (since the relevant internal node will be deleted and then other internal node would be selected as its replacement, I think). I'm unsure whether this would be an easy fix and it can be straightforward to spot this issue with a small tree, but the behavior might pose a bigger problem with larger trees.In the example below, if I extract the tree without leaf L4, I lose internal node I6 and therefore the mutations M5, M6, and M7.
Tree attached.
tree_simple.nexus.gz