soft-matter / trackpy-examples

sample images, examples, and speed tests for trackpy
Other
21 stars 42 forks source link

ENH: Add tutorials for subnets and adaptive search #17

Closed nkeim closed 9 years ago

nkeim commented 9 years ago

This is a companion PR to https://github.com/soft-matter/trackpy/pull/192.

You can view the 3 new notebooks in nbviewer:

danielballan commented 9 years ago

The fans can't wait! On Sun, Jan 4, 2015 at 7:19 PM Nathan Keim notifications@github.com wrote:

This is a companion PR to one coming in trackpy, that adds the actual adaptive search functionality.

You can view the 2 new notebooks in nbviewer:


You can merge this Pull Request by running

git pull https://github.com/nkeim/trackpy-examples adaptive-search

Or view, comment on, or merge it at:

https://github.com/soft-matter/trackpy-examples/pull/17 Commit Summary

  • ENH: Add tutorials for subnets and adaptive search

File Changes

Patch Links:

— Reply to this email directly or view it on GitHub https://github.com/soft-matter/trackpy-examples/pull/17.

tacaswell commented 9 years ago

I really like these!

I also think they are sufficiently hedgey on why this might be a bad idea that if users shoot them selves in the foot it's their own problem so :+1: from me.

Hopefully I will find time to go over the code a bit more carefully (it looked like most of the 'changes' were indentation related).

nkeim commented 9 years ago

I say in the adaptive search notebook that solving a subnet is generally O(N!). But upon further reflection it seems that it would be closer to O(2^N). Is that right?

danielballan commented 9 years ago

Can you elaborate on your reasoning? I always thought it was O(N!) for the number of possible pairs.

danielballan commented 9 years ago

P.S. I also think these notebooks are great. I would not have thought it possible to so clearly explain and demonstrate this very technical enhancement.

tacaswell commented 9 years ago

The exact scaling is going to depend on the details details of the network (a line shifted by half the spacing is going to be simpler to deal with than N -> N all-to-all subnets). In the worst case of the all-to-all if you pick one particle there are N ways to connect it and leave a N-1 -> N-1 all-to-all networks. The base case in 2-2 or 1-1 sub nets which have 2 and 1 possible connections (in this case we don't have to worry about new/dieing tracks as the cost of that will always be greater than making a connection) so via recursion you get N!.

The way the possibility of birth/death gets dealt with I think can be thought of as adding 'fake' particles to both the source and sink sets with weight of the search range, (which in the N-> N case actually only makes it O((N+1)!) ) so with un-matched sets I think it goes as (worst case) O((max(N, M)+1)!).

It practice it's not going to be that bad as (I think) most of the sub-networks that show up are more like the shifted line than the all-to-all and the code is clever enough to chop off entire sub-trees of recursion path if it can figure out that no connection set in the tree will beat the current best (because it still has a non-empty subnet left and is already worse than the current best) so the real performance will be no where near O(N!).

I also convinced my self that this problem is equivalent to a modified travelling salesman problem on a bipartite graph (and 'free' travel 'back' and a dummy entries to allow track death/birth) so it is NP-something. Explaining that in more detail may require my making diagrams.

tacaswell commented 9 years ago

Also, sorry for the semi-stream-of-consciousness brain dump.

nkeim commented 9 years ago

Thanks. I think I follow you (except for the last bit about the traveling salesman equivalency; that just vaguely makes sense to me). For large subnets I imagine the worst case is very rare, and O(2^N) is closer to typical. This is why, unlike what I calculate in the example notebook, in real life a 30-particle subnet is not 10^20 times slower than a 15-particle subnet. Attempting to solve a subnet with 30 particles is therefore usually safe, but occasionally disastrous.

Reflecting this reality, I actually had a termination condition in the numba subnet solver, to abort after a certain (large) number of iterations, but I removed it in the adaptive search PR: without a clear way to decide at the outset whether a given subnet is computationally advisable, adaptive search performs very poorly. Such a termination condition may still be a good idea, but the maximum number of iterations would have to be calculated in a thoughtful way, rather than being a fixed number I made up.

This calls for real-world data. Perhaps trackpy should profile the subnet solver, and send results back to a server. ;)

tacaswell commented 9 years ago

I don't quite understand why you would want a depth limit in the sub-net solver. Unless you can guarantee you are iterating through the possible linkings in the right order (which we can't other wise linking would be a one-step activity) cutting off at some number of attempts seems like it will chop off the best linking.

I really should go back and get my c++ linking code wrapped for python, now that we have conda there is a sane path to distribution.

nkeim commented 9 years ago

The numba solver is non-recursive, so by iteration limit I really mean a time limit. And the termination condition resulted in a subnet oversize exception (raised by the calling function). But I'm once again thinking it's a bad idea. When the oversize condition is reported instantly, it's useful. When it's reported after several minutes, it's needlessly confusing.

nkeim commented 9 years ago

Added a new tutorial on linking diagnostics, another new feature in https://github.com/soft-matter/trackpy/pull/192

danielballan commented 9 years ago

I built the docs, including the minor fixes in PR #192, and pushed.