route cache for performance?

kodonnell commented 7 years ago

I'm wondering if we should cache routing requests in computing the transition probabilities.

The first reason is obvious - if it's e.g. a long GPX track covering the same location multiple times, then some of the candidate routes may be repeated, etc.

The second reason is maybe a little more involved ...

for candidate-to-candidate routing we only use candidate's closest node
however, we dedupe candidates on edge not node
I believe this is 'correct' in the sense that two candidates are different even if they have the same closest node. @karussell - am I right in thinking this? Anyway, assuming it is correct, it means that for nearby candidates that share the same closest node but different edge, we are repeating the exact same route calculation. Herein lies the second benefit of a route cache.

It should be pretty easy to implement, though we'd probably want #73 implemented first so we can test performance etc.

stefanholder commented 7 years ago

Does LocationIndexMatch.findNClosest return the perpendicular of the passed GPS position on all real edges within gpxAccuracyInMetern? Moreover, if the perpendicular is at the start/end of a real edge does it then return a real node? (Anyway, it would be nice to document this method.)

however, we dedupe candidates on edge not node

All real node candidates should be deduped on node ID to avoid redundant computation. Not sure if we also need to dedupe virtual nodes (but we should if there are duplicates before adding direction). So we could just dedupe all candidates using QueryResult.getClosestNode() before adding direction.

Otherwise, I think it is very unlikely that the same transition occurs multiple times during a map matching so a route cache should not be needed.

kodonnell commented 7 years ago

Does LocationIndexMatch.findNClosest return ...

No idea. I'm using my own implementation, as findNClosest doesn't work for me.

All real node candidates should be deduped on node ID to avoid redundant computation

I thought that too, but see my comment above - we do need both edges as candidates (even if they share the same node). That's why we'd need the route cache, so that when these duplicate (by node ID) edges are routed, we only route once, to avoid redundant computations.

Otherwise, I think it is very unlikely that the same transition occurs multiple times ...

It depends on the size and nature of the route. E.g. for a taxi based at a single stand (e.g. the airport), they'll travel the same roads around the stand many times - and depending on the nature of the roads and the frequency of their GPS measurements, the same transitions may occur often. Similar for e.g. buses. My use case certainly duplicates them, but it's unlikely to be relevant to most. Either way, the route cache should help performance in the general case - and greatly help it if there are many duplicate transitions. So, no only positives, no harm?**

** OK, maybe more memory. Though eventually I'd like to not cache the actual paths (likewise in the timesteps), just the distance/time, etc.

stefanholder commented 7 years ago

we do need both edges as candidates (even if they share the same node)

I don't think so, because we only use the closest node for the candidate (as you wrote before). This is because the routing between subsequent candidates only needs a start and an end node. The closest edge of the QueryResult is never used.

the same transitions may occur often

But only if we get the exact same virtual nodes by another LocationIndexMatch.findNClosest call. Not sure how likely this is. We would need a tolerance to return the same virtual nodes for similar GPS positions.

kodonnell commented 7 years ago

The closest edge of the QueryResult is never used.

I think it is - when we do the lookup for the virtual node. E.g.

   v1    v2
------N-------

Currently, we get two candidates, one with a virtual node somewhere on v1 and one with a virtual node on v2. If we dedupe on closest node (N) then we'll only get one virtual node (on v1 or v2 depending which is closer to the GPX point), and hence we'll eliminate one (valid) candidate node. It's worth noting that these virtual nodes are directly used in the computation of the transition probabilities. However, that also means that our route cache wouldn't help if e.g. we were using the nodes as the key ...

But only if we get the exact same virtual nodes ...

Fair point. Again, I'm thinking of my use case (cell data) where the GPS location are quantized as such, and hence are likely to be repeated. In the generic case, I think you're right.

stefanholder commented 7 years ago

It's worth noting that these virtual nodes are directly used in the computation of the transition probabilities.

Can you please show me where this happens? When calling algo.calcPath, only the virtual nodes are used as input to compute the path. Different calls of this method with different candidates but the same virtual nodes should still give same results.

Or are you referring to PR #83? There we potentially need to dedupe the result of findGPXEntriesInGraph on virtual nodes before adding direction.

kodonnell commented 7 years ago

I think we agree - queryGraph.lookup can replace the closest node with a virtual one, which may then be an input to algo.calcPath, and this will always calculate the same path for the same (virtual or not) nodes. However, my point is that you'll be missing a candidate edge. Back to my picture, say we have the following

  v1     v2                       u1     u2
------N-------                  ------M-------

with N/M virtual nodes (GPX points) and v1/v2/u1/u2 virtual edges (well pairs of them). If we dedupe on closest node, we might end up with just these edges:

  v1                        u2
------N                  M-------

when there are actually three other possible arrangements. Whether or not it's OK to ignore these three, I'm not sure ... but if feels like we shouldn't (especially if we're going to be doing directional stuff e.g. #83). I also think I tried it (it's a one-liner replacement) and the tests failed - not 100% sure, but I'll check tomorrow if you haven't confirmed.

stefanholder commented 7 years ago

queryGraph.lookup can replace the closest node with a virtual one

My understanding is that a GPS point is not replaced by a virtual node but that the closest (virtual) point of a GPS point is the perpendicular of the GPS point on the real edge. So different GPS points might have the same closest virtual node and this virtual node is then used in both cases for routing. @karussell, can you please clarify?

karussell commented 7 years ago

I'm not sure if the QueryGraph already does the optimization that two GPS points create only one virtual node if it is the same, I would have to try this.

My understanding is that a GPS point is not replaced by a virtual node but that the closest (virtual) point of a GPS point is the perpendicular of the GPS point on the real edge.

A GPS point is nothing we feed the algorithm, so we cannot replace nodes with it. Instead a GPS point is looked up with the QueryGraph which highly likely creates a virtual node (and 4 virtual edges) and sets the 'closest node' property of QueryResult.

          X
          |
A <------ V <---- B
  ------>   ---->

So X is the measurement via lat,lon ('GPS point') and creates a snapped point on the edge A-B (lat,lon), this snapped point has an associated virtual node V and is inserted with 4 new virtual edges. A and B are real nodes.

The GPS point highly likely creates no virtual nodes and edges if the measurement is not 'perpendicular' relative to the edge like for 'endstanding' edges e.g.

         X
        /
A------B

stefanholder commented 7 years ago

Thanks @karussell! Does LocationIndexMatch.findNClosest return the snapped position of the GPS position to each real edge within gpxAccuracyInMetern?

karussell commented 7 years ago

Yes, this is true for mostly all cases. But it can also return results outside of gpxAccuracyInMetern e.g. if otherwise the set would be empty. So it returns any closest match but in most cases there will be matches within the limit

stefanholder commented 7 years ago

queryGraph.lookup can replace the closest node with a virtual one, which may then be an input to algo.calcPath, and this will always calculate the same path for the same (virtual or not) nodes.

Correct. Sorry, I first misread your sentence.

However, my point is that you'll be missing a candidate edge.

If we do the following, I don't think we're missing candidate edges:

Get all query results from findNClosest
Call QueryGraph.lookup with all query results
Obtain the list of closest nodes from all query results
Dedupe this list on node id
Fall each virtual node in the list, create a candidate for each direction.

kodonnell commented 7 years ago

Maybe we should wait until we see what form #83 takes?

stefanholder commented 7 years ago

Maybe we should wait until we see what form #83 takes?

Yes, this makes sense.

stefanholder commented 7 years ago

@karussell: I have a PR ready for deduping all query results by node id after calling QueryGraph.lookup. Should I create this PR now or wait until this repository is merged into the graphhopper repository?

karussell commented 7 years ago

Nice - looking forward to this!

I have a PR ready for deduping all query results by node id after calling QueryGraph.lookup

Will this reuse existing virtual nodes, or in which sense did you mean deduping?

Should I create this PR now or wait until this repository is merged into the graphhopper repository?

Please do not wait for my action. Currently too many things on the table.

stefanholder commented 7 years ago

Will this reuse existing virtual nodes, or in which sense did you mean deduping?

This will only dedupe multiple occurrences of the same tower node.

graphhopper / map-matching

route cache for performance? #81