matsen / pplacer

Phylogenetic placement and downstream analysis
http://matsen.fredhutch.org/pplacer/
GNU General Public License v3.0
75 stars 18 forks source link

implement KR distance of Voronoi regions to leaves #94

Closed matsen closed 13 years ago

matsen commented 13 years ago

Assume we have a Voronoi region V associated with a leaf l, containing a collection of mass M. We would like to be able to quickly calculate the KR distance between M and the amount of mass in M concentrated at the leaf l.

Thus, we would like a function that takes a leaf and a (sub) Indiv mass map and calculates the required distance. Note that Kr_distance.dist is ready to do this sort of calculation.

We would also like a function that takes a Voronoi region and pulls out the Indiv mass map that is contained in the Voronoi region.

Combining these together meets the need, and will allow us to cache these sub Indiv masses for faster comuputation in the complete greedy voronoi algorithm.

matsen commented 13 years ago

Note that I've changed the name of this issue.

Algorithm:

Start with S being all of the leaves of the tree.

From a programming perspective, we can keep track the Voronoi regions as well as maintain a map from S to the leaf KR work as above. Once one is selected to remove, we just have to recompute those Voronoi regions contacting it, as well as their leaf KR work. We might also maintain a list of leaves ordered by the leaf KR work, but I'll leave that up to you.

habnabit commented 13 years ago

This should be essentially done now. The last thing, as I understand it, is to do a bit of refactoring and then test the value of the KR distance evaluated at a number of leaves on a diagram.

matsen commented 13 years ago

Yes, and there are some simple place files in tests/data/simple

On Fri, Jul 15, 2011 at 4:04 PM, habnabit reply@reply.github.com wrote:

This should be essentially done now. The last thing, as I understand it, is to do a bit of refactoring and then test the value of the KR distance evaluated at a number of leaves on a diagram.

Reply to this email directly or view it on GitHub: https://github.com/matsen/pplacer/issues/94#issuecomment-1583923

Frederick "Erick" Matsen, Assistant Member Fred Hutchinson Cancer Research Center http://matsen.fhcrc.org/

matsen commented 13 years ago

This looks good, but there are a couple of XXX's to clean up in rppr_voronoi.ml. They are minor, and actually explain what I was trying to remember about the voronoi speedup.

I do feel strongly, though, about "diagram" versus "graph." The former is standard in the literature, and the second is misleading.

matsen commented 13 years ago

This code looks awesome and ready to merge. It's tough for me to evaluate your unit tests, though. If you have doodles I'd like to scan them.

habnabit commented 13 years ago

I have a sketch of the tree being used for the voronoi code in general, but not specifically for evaluating the KR distance on the masses. I ran it myself and recorded the values that it produced as the expected values for the tests.