prevent inner-link U-turns

kodonnell commented 7 years ago

See #70

WIP. Things to note (@stefanholder)

until here is resolved I just manually set unfavored statuses
not sure whether what I've done is correct (e.g. use of edges and direction).
MapMatchingTest2 includes the example original in #70, plus a failing test. Most of the other tests fail (I think I know why, but we can worry about that later)

I'm still not sure how the logic is going to work. To clarify, consider

     v1    v2 
 A------X------B

with nodes A,B and a GPX position at X (i.e. the virtual node), and virtual edges v1 and v2. What is being done in this PR is creating two candidates for this GPX point: both with closestNode = X, but one with incoming=v1 and outgoing=v2, and the other with that reversed. When we route, we simply unfavor the incoming/outgoing edge to force the other to be used. E.g. if X is the from candidate, then we'd have

candidate 1: incoming=v1 (unfavored), outgoing=v2

     v1    v2 
 AxxxxxxX--->--B

candidate 2: incoming=v2 (unfavored), outgoing=v1

     v1    v2 
 A--<---XxxxxxxB

We can do similar for the to candidate but unfavor the outgoing edge.

However, how do we actually prevent inner-link U-turns with this? When X is the to timestep, we have a candidate

     v1    v2 
 A-->---XxxxxxxB

and when it's the from timestep we have a candidate

     v1    v2 
 A--<---XxxxxxxB

in other words the inner-link U-turn is still possible:

 A-->---XxxxxxxB
 A--<---XxxxxxxB

@stefanholder - how were you planning to get around this when you said:

If a candidate is a virtual node + direction, we can enforce the direction during routing with QueryGraph.enforceHeadingByEdgeId. We can store the direction as the outgoing virtual edge id.

Since we don't know the direction of the 'right' to candidate we can't enforce the 'right' direction to start with.

If you agree it's not resolvable, then I think an 'online' (i.e. step by step) solution might be best: we calculate the best sequence up to the give point, and the we can enforce that as the direction at the start of the route to all other candidates.

karussell commented 7 years ago

Can you try with the map data at map-data/issue-70.osm.gz instead of a fresh map? I expect that the more recent map variant changes also how things match.

stefanholder commented 7 years ago

Can you try with the map data at map-data/issue-70.osm.gz instead of a fresh map?

Still get same results (with smaller node IDs).

kodonnell commented 7 years ago

with smaller node ID

FYI issue-70.osm.gz is just an export from OSM of the region directly surrounding the relevant route. So, while it's locally the same road network, the actual graph in GH may differ, and affect the results. (For example, the edges are likely to experience different contraction, etc.)

can you please explain to me how to use the same routing settings in MapMatchingMain.java as in testIssue70

I haven't used main before, but the setup is quite different to that of the test.

karussell commented 7 years ago

Indeed, can you try the command line again and remove the two subnetwork properties from the arguments and increase max_visited_nodes to e.g. 5000?

For example, the edges are likely to experience different contraction

The default setting is a disabled CH. Furthermore if the input OSM is the same, the graph should be identical including the contraction, see https://github.com/graphhopper/graphhopper/issues/556

stefanholder commented 7 years ago

Indeed, can you try the command line again and remove the two subnetwork properties from the arguments and increase max_visited_nodes to e.g. 5000?

After removing the two subnetwork property arguments, I got the following exception during the map import of issue-70.osm.gz:

2016-12-07 10:42:55,701 [main] WARN  com.graphhopper.storage.BaseGraph - More than a half of the network should be removed!? Nodes:65, remove:65
Exception in thread "main" java.lang.IllegalStateException: graph is empty after in-place removal but was 65
        at com.graphhopper.storage.BaseGraph.inPlaceNodeRemove(BaseGraph.java:704)
        at com.graphhopper.storage.GraphHopperStorage.optimize(GraphHopperStorage.java:219)
        at com.graphhopper.routing.subnetwork.PrepareRoutingSubnetworks.doWork(PrepareRoutingSubnetworks.java:90)
        at com.graphhopper.GraphHopper.cleanUp(GraphHopper.java:1150)
        at com.graphhopper.GraphHopper.process(GraphHopper.java:693)
        at com.graphhopper.GraphHopper.importOrLoad(GraphHopper.java:663)
        at com.graphhopper.matching.MapMatchingMain.start(MapMatchingMain.java:69)
        at com.graphhopper.matching.MapMatchingMain.main(MapMatchingMain.java:45)

karussell commented 7 years ago

After removing the two subnetwork property arguments

Ok, then it was probably not the reason. BTW: the subnetwork properties remove too small islands, and having a 0 there means no removal.

But as there are only a few parameters you should be able to identify the differences: weighting (default is fastest), vehicle, max visited nodes, gps accuracy, algorithm (should be bidir dijkstra) and I guess all subnetwork properties are 0 in the tests too., so that they don't remove anything.

karussell commented 7 years ago

I can also have a look into the cause

stefanholder commented 7 years ago

I can also have a look into the cause

Thanks. I will try first but maybe will get back to you.

stefanholder commented 7 years ago

Interestingly, running unit test testIssue70 and matching the same GPX file from the command line using the entire Serbia map gives different matching results even if sigma and beta are the same.

The problem was that I used mm-uturns1.gpx for the command line and issue-70.gpx for the unit test. The GPS coordinates are the same in both files but the in mm-uturns1.gpx all timestamps are the same, whereas in issue-70.gpx timestamps are 60 s apart.

For mm-uturns1.gpx (with 60 s timestamp differences) the penalizing of routing paths via QueryGraph.enforceHeadingByEdgeId works as it should but mm-uturns2.gpx still has strange U-turns. I will further investigate this.

kodonnell commented 7 years ago

whereas in issue-70.gpx timestamps are 60 s apart

Ah yes, I vaguely remember doing that, as it wasn't working (and didn't make sense) with all the timestamps the same.

As an aside, while thinking up something else, I thought of a different method for completely preventing inner-link U-turns: only route from (real) tower nodes to tower nodes. This will obviously prevent inner-link U-turns (which require virtual nodes) and I think it shouldn't affect map-matching: we'd have to take care with the first/last points of the sequence, but others doesn't really need virtual nodes. E.g. in

---T1----X1----T2----X2----T3----X3---T4---

with Ti tower nodes and Xi virtual nodes, we get the same map-matched route going from X1->X2->X3 as we do from T1->T2->T3->T4. That's a simplistic example - in the general case we may have to do something like have two candidates for each edge (i.e. the base/adj node) instead of just the single virtual one. (Assuming it is virtual - some may, of course, already be tower nodes.) I can go into more detail if required.

That said:

@stefanholder might already have a solution
there are maybe 'better' ways e.g. here, but they're harder to implement - this shouldn't require changing too much.

Thoughts?

stefanholder commented 7 years ago

Ah yes, I vaguely remember doing that, as it wasn't working (and didn't make sense) with all the timestamps the same.

If timestamps are the same, then map matching as it is currently implemented doesn't work at all, because then all transitions have zero probability. Actually, it would be better to throw an exception in this case or use the non-normalized metric.

As an aside, while thinking up something else, I thought of a different method for completely preventing inner-link U-turns: only route from (real) tower nodes to tower nodes.

It's an interesting approach, which is simple to implement and should still solve the U-turn problem. However, I would expect that the matching quality gets worse in general because in some cases emission and transition probabilities would be biased:

an edge close to the GPS position can have tower nodes that are both far apart from the GPS position, whereas an an edge more far away from the GPS position can have tower nodes that are rather close to the GPS position. Hence, the candidates of the closer edge would receive a lower emission probability than they should.
the transition probability metric also uses the linear distance between both GPS positions. For cases where tower edges are far away from the GPS position the transition probability would be lower than it should be.

kodonnell commented 7 years ago

Actually, it would be better to throw an exception

Or, even better, have a customisable 'maximum allowed speed' between events (e.g. 200km/h). This would fail on not only time diff = 0, but also similarly small ones. And, even better, don't throw an exception, but (optionally) break the sequence here and start a new one. At least, this is how I'm doing it currently. I may try and get my code into a PR today.

I would expect that the matching quality gets worse in general because in some cases emission and transition probabilities would be biased

Those are both good points!

the emission probability metric ...

What about using the same distance in the emission probability (i.e. the perpendicular distance of the snapped point to the GPX point). I think that makes sense in the context of HMM emission probabilities.

the transition [note my correction] probability metric ...

Maybe we could replace the linear distance between GPX tracks with the linear distance between the nodes? I think this makes sense as currently it's really used to favour straight routes (i.e. the shortest), which would still be true if we used tower-to-tower linear distance.

stefanholder commented 7 years ago

What about using the same distance in the emission probability (i.e. the perpendicular distance of the snapped point to the GPX point). I think that makes sense in the context of HMM emission probabilities.

We would then have the same emission probability for both tower nodes of the edge and hence the Viterbi algorithm would have no clue where we really are on this edge. Moreover, if we replaced the linear distance between GPS positions with the linear distance between the candidate nodes then we would have very low transition probabilities between both tower node candidates of the same edge no matter where the GPS positions actually are. This would most likely lead to worse map matching results, e.g. going back and forth between the tower nodes of same edge in the map matching result.

In general, an HMM tries to find the correct hidden states for known observations. The better our modelling of hidden states reflects reality, the better results we will get. So I think we should stick to directed virtual node candidates.

karussell commented 7 years ago

So I think we should stick to directed virtual node candidates.

We could more easily represent the 'directed nodes' here, if we use the edge based traversal instead (and fix the 4 edges problem with QueryGraph too of course).

kodonnell commented 7 years ago

We would then have the same emission probability for both tower nodes of the edge and hence the Viterbi algorithm would have no clue where we really are on this edge.

Maybe I should rephrase - in my head, I'm effectively thinking in terms of only allowing complete real edges - that's what I mean by routing between real tower nodes. (Sorry, I think I went down a rabbit-hole in response to your questions.) Anyway, I don't think anything needs to change - we still use the snapped distance in the emission probabilities, and same linear distance - but we simply don't call queryGraph.lookup, to ensure that we only have real edges. (All of the above is ignoring first/last edges.) I may have missed some crucial point here though ...

This would most likely lead to worse map matching results ... The better our modelling of hidden states reflects reality, the better results we will get

I wonder - is there an easy way we can score our algorithm? We've got unit tests, but they're only pass/fail. I've seen in some papers that they compare the map-matched route with the actual route (including by down-sampling the input GPX track) - this would give us a metric for how 'good' our map-matching actually is. And then we could easily settle such matters = ) @stefanholder have you ever done this sort of error quantification? Or know of any relevant data sets? Maybe a new issue (hopefully I'm not raising too many!).

I think we should stick to directed virtual node candidates

I don't want to sound like a broken record, but as above I'm still not sure a first order HMM with the standard viterbi algorithm will ever be able to handle this. I am most happy to be wrong, and I think it's a good solution if I am = )

karussell commented 7 years ago

but as above I'm still not sure a first order HMM with the standard viterbi algorithm will ever be able to handle this

Instead of nodes we can route with edges and this should solve this. The only problem in my head would be indifferent solutions where the snap is exactly on a junction node.

kodonnell commented 7 years ago

Instead of nodes we can route with edges and this should solve this.

Agreed - as long as we use directed edges?

The only problem in my head would be indifferent solutions where the snap is exactly on a junction node.

Maybe, in that case, we add all possible edges coming from that junction node, and leave viterbi to choose which is 'best'?

stefanholder commented 7 years ago

Good news: I managed to prevent/penalize inner-link u-turns using QueryGraph.enforceHeadingByEdgeId. The GPX files mm-uturns1.txt and mm-uturns2.txt from original forum post are now matched without U-turns. Moreover, all unit tests are successful except one, where I need your help. The changes are not that complicated but needed some time because I had to solve some other issues as well. I will publish this after cleaning up everything.

I've seen in some papers that they compare the map-matched route with the actual route (including by down-sampling the input GPX track) - this would give us a metric for how 'good' our map-matching actually is. And then we could easily settle such matters = ) @stefanholder have you ever done this sort of error quantification? Or know of any relevant data sets? Maybe a new issue (hopefully I'm not raising too many!).

The authors of the paper, our HMM map matching approach is based on, provide a GPS test track and the ground truth path, along with a road network in a custom format: http://research.microsoft.com/en-us/um/people/jckrumm/MapMatchingData/data.htm. Moreover, it's easy to create own GPS test tracks and corresponding ground truth paths. I think setting up a map matching quality benchmark deserves a new issue.

Maybe, in that case [snap is exactly on a junction node], we add all possible edges coming from that junction node, and leave viterbi to choose which is 'best'?

This would work but penalizing inner-link U-turns seems to solve most problems. However, we could try this out in a separate PR.

stefanholder commented 7 years ago

I just created PR #88 with my changes. I did this as a separate PR because I rewrote the git history of this PR and I didn't want to mess up things. I think we can then close this PR.

karussell commented 7 years ago

Thanks @stefanholder !

kodonnell commented 7 years ago

I think setting up a map matching quality benchmark deserves a new issue.

Agreed - see #89.

I just created PR #88 with my changes. I did this as a separate PR because I rewrote the git history of this PR and I didn't want to mess up things. I think we can then close this PR.

Nice work! Agreed - will close.

graphhopper / map-matching

prevent inner-link U-turns #83