Closed jeromekelleher closed 5 years ago
notation
Agree.
The connection with the Sankoff algorithm is neat and we should probably think about this a bit more and maybe return to it in the discussion. Are there classical phylogenetics algorithms we can speed up by maintaining vectors efficiently across trees?
Good point! The "peeling" algorithm for likelihood computation could totally be done this way. (concretely, suppose we have at each of a sequence of positions along the genome a set of observed discrete "phenotypes", which might be nucleotide states (so I should probably say genotypes), and a Markov transition matrix; compute the probability of the phenotypes given the tree sequence.
Might as well keep a version of the algorithms here, right?
That seems right to me, yes.
OK, great. Shall we merge this much?
Yes, looks great. You can merge also, right?
Yeah, I can merge. Just wondering what the protocol for updates should be. Shall we just push to master ourselves or open PRs and let the other review and merge if happy?
Either way, but the PR method is nice to see what's changed, if there's enough changes that it's worthwhile.
OK, let's assume so that if a PR is made it's up to the other person to merge it.
Here's some minor tweaks to the treestats paper @petrelharp. Nothing of any particular importance here, just some tidy ups and some comments.
One thing I think would help clear up the notation a bit is to $u$ and $v$ to refer to nodes and keep $i$, $j$ etc for referring to sites. I find it confusing at the moment trying to figure about what a particular $w$ means and seeing $w(u)$ would immediately tell me that it's the weight for a node.
The connection with the Sankoff algorithm is neat and we should probably think about this a bit more and maybe return to it in the discussion. Are there classical phylogenetics algorithms we can speed up by maintaining vectors efficiently across trees?
My next step would probably be trying to pick up the implementation (both Python and in terms of algorithm listings), staring with a literal naive version and working towards an efficient general algorithm. Might as well keep a version of the algorithms here, right? How do you see it developing?