correctly utilise dijkstra one to many

graphhopper / map-matching

The map matching functionality is now located in the main repository https://github.com/graphhopper/graphhopper#map-matching

https://www.graphhopper.com/open-source/

Apache License 2.0

785 stars 274 forks source link

correctly utilise dijkstra one to many #82

Closed kodonnell closed 7 years ago

kodonnell commented 7 years ago

It's much faster ... should we warn the user if they're no using the above? I imagine it's easy to not think about it (as I did), and get stuck with much worse performance ...

I was going to suggest updating the doco to reflect this ... but when I checked, it looks like the doco is still for the old API?

If you agree, I'll make the changes ...

Edit: I just realised that we do need to change the computeTransitionProbabilities to actually benefit from dijkstra one-to-many as currently it creates a new algo for every route, and hence doesn't utilise the cache etc. Shouldn't be hard to tweak though. (I didn't notice as I'm working on a bespoke implementation at the moment.)

karussell commented 7 years ago

It's much faster ... should we warn the user if they're no using the above?

What do you mean here? Do you mean the class DijkstraOneToMany? This class is not suited to be used for many scenarios (e.g. does not work with virtual nodes/edges out of the box) or do you mean to use the Dijkstra class with the use cases one-to-many? This is indeed not documented but easy doable

but when I checked, it looks like the doco is still for the old API?

This could be, yes.

kodonnell commented 7 years ago

What do you mean here?

I really just use algorithm(Parameters.Algorithms.DIJKSTRA_ONE_TO_MANY) in the algorithm options, and tweaked computeTransitionProbabilities to reuse the algo for all to candidates.

This class is not suited to be used for many scenarios

Hmmm, I see. Everything just worked in the sense that the code ran through without errors - though I haven't compared the actual route with that created from other algorithms (though initial inspection looks OK).

Do you agree that using the DijkstraOneToMany would be valuable? If so, how feasible? Also, I just realised something - am I right that you'd be cautious of helping as this is effectively creating a distance matrix, which (I believe) is still proprietary?

I'll update the doco - what should we pick as the default algo for now?

karussell commented 7 years ago

I really just use algorithm(Parameters.Algorithms.DIJKSTRA_ONE_TO_MANY) in the algorithm options, and tweaked computeTransitionProbabilities to reuse the algo for all to candidates.

The specific class won't work easily in this scenario. What we could do is to use the Dijkstra class with a set of to points, similar to this code

karussell commented 7 years ago

I'll update the doco - what should we pick as the default algo for now?

Maybe let us fix this here and keep the documentation using one to many?

kodonnell commented 7 years ago

The specific class won't work ...

What do you mean by 'won't work'? It 'worked' for me, and (conceptually) seems to fit.

Maybe let us fix this here and keep the documentation using one to many?

Fixing this first is probably best, yes.

karussell commented 7 years ago

What do you mean by 'won't work'? It 'worked' for me, and (conceptually) seems to fit.

The DijkstraOneToMany has problems in this scenario which a one-to-many implemented Dijkstra would not have:

does not support virtual nodes and edges (yet?)
would require to allocate a new algorithm object for every request, which means you allocate arrays with size in the millions for a world wide graph. The RAM usage could explode for many parallel requests.
no turn cost support

I.e. DijkstraOneToMany is currently designed to work very fast but for CH preparation only.

kodonnell commented 7 years ago

Hmmm, I tried to implement the custom dijkstra you linked above, but it's about twice as slow as DijkstraOneToMany, even when I don't extract any paths.

On the other hand, DijkstraOneToMany still seems to work = ) I don't need turn costs (and nor do most average users, I imagine), so that's fine. It hasn't complained about virtual nodes/edges yet (and seems to be working the same as other algorithms). As for RAM ... I just create one algo, and algo.clear() whenever I want to reset the from node. Either way, I imagine many users (i.e. me) would be happy to use lots of RAM if it's a lot faster.

karussell commented 7 years ago

Can you count the visited nodes for both? This should be identical.

just create one algo, and algo.clear() whenever I want to reset the from node.

Are you using this from a web service where concurrent requests could come in?

kodonnell commented 7 years ago

Can you count the visited nodes for both

Not easily with my current implementation. Want me to submit a rough PR (with a unit test) so we can compare and discuss there?

Are you using this from a web service where concurrent requests could come in?

In production no - it'll be a batch job (running in Spark, which I eventually figured out how to do). If I see where you're going ... DijkstraOneToMany would give faster requests, and possibly even a higher throughput (depending on the situation).

kodonnell commented 7 years ago

DijskstraOneToMany seems to pass the tests for non-CH graphs (except for one about num visited nodes which I don't think is a problem) ... see issue82.

@karussell - thoughts? Once #73 is incorporated we can discuss performance differences ...

karussell commented 7 years ago

If you have a smaller area this might improve performance but otherwise you create many large arrays everytime you create this class (see inside DijskstraOneToMany). And again: virtual nodes could make problems e.g. if 10 virtual nodes are necessary and then you do a clear and then 20 virtual nodes are necessary, then the arrays are too small. (One could create big enough arrays here, not sure)

kodonnell commented 7 years ago

I agree with all you said. I think there are possible cases where it could be useful (especially if it's faster - as it is in my case), but they're probably a minority. So for now I'll close this issue.