Closed eharkins closed 5 years ago
It depends what the annotate
command is. As I understand it, it's running annotate with --all-seqs-simultaneous
, so the idea is there will be one event in the output, but as we've discussed in the past it's impossible to simply force all together in all circumstances, so it seems like there could be more than one event, and assuming any particular sorting of these events would be a bad idea.
But even that aside, it seems needlessly risky to me to set it up so that if we ever change the annotate
command but forget to update this that it will break horribly. I think it's much better to simply enforce that what you're expecting to get back from the annotate command is what you get.
Makes sense, thanks!
@psathyrella This seems to further highlight to me the value of pulling out the lbi/lbr computation code from the annotate command. It feels a little silly to be worrying about multiple events for a computation which really just depends on an input tree.
@csmall well yes, in principle the bare lb metrics only depend on the tree, but the stuff around them depends on other things that are in the annotation, e.g. affinity and multiplicity. I think I dug up the details the first time we discussed this, but I agree it is worth eventually separating the two, but it requires a bunch of code duplication, since we'll have to have the infrastructure to pull the other info out of the annotation, pass it in separately, verify it's sensible, and then put it back into the annotation after. And as context, cft is not the only place the lb functions are getting called from. It's on my todo list, but I won't get to it in the next couple days.
LOL! Once again, I am not csmall here. Sorry Craig...
I'm still not sure what you mean by "the stuff around them", though I do remember us talking about it at lunch. If there's anything you can link us to here for posterity that would be great.
I'm all for reducing code duplication when it doesn't overly complicate things. It would be great if there was a way that the additional information could be optionally passed in as an API call or some such, as needed. But I realize it's sometimes easier said than done and leave it up to you what the best solution is.
Glad this is on your todo list; no major rush as far as I'm concerned.
See https://github.com/matsengrp/cft/pull/261#discussion_r255634321
When we merge selection metrics from a partis annotate command, we take
events[0]
instead of finding the appropriate cluster annotation as is done inprocess_partis.py
Duncan said:
@metasoarous were you taking the index 0 event before because partis annotate is called in this case on a cluster-by-cluster basis so there is only ever one event and therefore it's ok to take the first so long as we have a check like Duncan suggests? @psathyrella, since
events
comes from the results of annotate with--get-tree-metrics
on a particular tree, does it make sense to assume its ok to takeevents[0]
? Or should we do something like process_partis.py?