Closed matsen closed 12 years ago
This has been implemented in the branch for #192.
This is totally cool, and seems to work in the examples I checked out. Nice.
It would be wonderful to have a -t
flag that would write out the
discordance tree at the level convexified. This "suggestion" tree would
indicate the new name. I think that a useful and easy-to-implement way
to go would be to append the suggested name on to the sequence id. Thus
if S001576771 gets reclassified to Pseudomonas monteilii then the new
sequence id would be
S001576771 -> Pseudomonas monteilii
This would make it easy to review the reclassifications.
And one more thing. If we could have -
replace any non-normal float that would be super. I.e. we shouldn't have any -nan in the avg distances.
Then an explanation of why we don't do the calculation in the docs... because comparing distances between ranks is not so meaningful.
This is going to be an ongoing project but this one and it's friend infer
should go ahead and get merged into dev.
This extends the work done in #171.
This will only be done with taxids that are less specific than a given rank. The default should be species, but this should be specifiable through a
--max-rank
flag.For every rank that is is less specific than the max-rank flag, For every leaf that is not convex at that rank,
orig_taxid
is the original taxidguppy infer
without the distance requirement, getting a new taxidnew_taxid
seq_name
: the leaf sequence nameold_taxid
:new_taxid
:old_name
: the name corresponding toold_taxid
new_name
: the name corresponding tonew_taxid
makes_convex
: doesnew_taxid
sit inside the list of taxids that would make the tree convexold_avg_dist
: what is the average distance from the leaf to leaves maintaining their classification ofold_taxid
after the convexify stepnew_avg_dist
: what is the average distance from the leaf to leaves maintaining their classification ofnew_taxid
after the convexify stepn_with_old
: number of sequences with the old taxidn_nonconvex
: the number of the sequences with the old taxid that were called nonconvex by the convexify step