Closed wsdewitt closed 7 years ago
As per comments in #180, we'd like to make it possible to ensure that any seeds present in a cluster end up the pruned output. I left a review comment about where I think that can fit.
My tiny test tree is:
(((seed:1,(l3:1,l4:2)s2:1,(l5:3,l6:4)s3:1)b:1,(l1:1,l2:2)s1:1),naive0:1);
/-seed
|
| /-l3
/-|--|
| | \--l4
| |
| | /---l5
/-| \-|
| | \----l6
| |
--| | /-l1
| \-|
| \--l2
|
\-naive0
It has 3 subtrees each with 2 leaves. The branch lengths are l1 = 1, l2 = 2, l3 = 1, l4 = 2, l5 = 3, l6 = 4
. So if prune.py
is working right, and we ask for 3 leaves, we expect to and do get
naive0
l3
l1
l5
seed
With the old code, we don't get any leaves from the l5,l6
subtree, since they're all further away than the nodes in the other two subtrees.
The leaves_to_keep
set cannot include naive or seed, but we want them in the output taxa IIUC. The reversal of order happens in the indexing on line 70. Maybe slice is not clear, prefer list(reversed(my_list))
?
@metasoarous with the current prune.py
, if we ask for --n 100
seqs, we'll actually get 102 seqs because of the addition of naive and seed. Is this ok by you?
@WSDeWitt I think it would be better to keep it at 100. I can tweak this though when I do my merge.
closes #12