matsengrp / cft

Clonal family tree
5 stars 3 forks source link

12 pruning #184

Closed wsdewitt closed 7 years ago

wsdewitt commented 7 years ago

closes #12

metasoarous commented 7 years ago

As per comments in #180, we'd like to make it possible to ensure that any seeds present in a cluster end up the pruned output. I left a review comment about where I think that can fit.

wsdewitt commented 7 years ago

My tiny test tree is:

(((seed:1,(l3:1,l4:2)s2:1,(l5:3,l6:4)s3:1)b:1,(l1:1,l2:2)s1:1),naive0:1);
         /-seed
        |
        |   /-l3
      /-|--|
     |  |   \--l4
     |  |
     |  |   /---l5
   /-|   \-|
  |  |      \----l6
  |  |
--|  |   /-l1
  |   \-|
  |      \--l2
  |
   \-naive0

It has 3 subtrees each with 2 leaves. The branch lengths are l1 = 1, l2 = 2, l3 = 1, l4 = 2, l5 = 3, l6 = 4. So if prune.py is working right, and we ask for 3 leaves, we expect to and do get

naive0
l3
l1
l5
seed

With the old code, we don't get any leaves from the l5,l6 subtree, since they're all further away than the nodes in the other two subtrees.

wsdewitt commented 7 years ago

The leaves_to_keep set cannot include naive or seed, but we want them in the output taxa IIUC. The reversal of order happens in the indexing on line 70. Maybe slice is not clear, prefer list(reversed(my_list))?

wsdewitt commented 7 years ago

@metasoarous with the current prune.py, if we ask for --n 100 seqs, we'll actually get 102 seqs because of the addition of naive and seed. Is this ok by you?

metasoarous commented 7 years ago

@WSDeWitt I think it would be better to keep it at 100. I can tweak this though when I do my merge.