Closed kodonnell closed 7 years ago
Can you try with the map data at map-data/issue-70.osm.gz instead of a fresh map? I expect that the more recent map variant changes also how things match.
Can you try with the map data at map-data/issue-70.osm.gz instead of a fresh map?
Still get same results (with smaller node IDs).
with smaller node ID
FYI issue-70.osm.gz is just an export from OSM of the region directly surrounding the relevant route. So, while it's locally the same road network, the actual graph in GH may differ, and affect the results. (For example, the edges are likely to experience different contraction, etc.)
can you please explain to me how to use the same routing settings in MapMatchingMain.java as in testIssue70
I haven't used main before, but the setup is quite different to that of the test.
Indeed, can you try the command line again and remove the two subnetwork properties from the arguments and increase max_visited_nodes to e.g. 5000?
For example, the edges are likely to experience different contraction
The default setting is a disabled CH. Furthermore if the input OSM is the same, the graph should be identical including the contraction, see https://github.com/graphhopper/graphhopper/issues/556
Indeed, can you try the command line again and remove the two subnetwork properties from the arguments and increase max_visited_nodes to e.g. 5000?
After removing the two subnetwork property arguments, I got the following exception during the map import of issue-70.osm.gz:
2016-12-07 10:42:55,701 [main] WARN com.graphhopper.storage.BaseGraph - More than a half of the network should be removed!? Nodes:65, remove:65
Exception in thread "main" java.lang.IllegalStateException: graph is empty after in-place removal but was 65
at com.graphhopper.storage.BaseGraph.inPlaceNodeRemove(BaseGraph.java:704)
at com.graphhopper.storage.GraphHopperStorage.optimize(GraphHopperStorage.java:219)
at com.graphhopper.routing.subnetwork.PrepareRoutingSubnetworks.doWork(PrepareRoutingSubnetworks.java:90)
at com.graphhopper.GraphHopper.cleanUp(GraphHopper.java:1150)
at com.graphhopper.GraphHopper.process(GraphHopper.java:693)
at com.graphhopper.GraphHopper.importOrLoad(GraphHopper.java:663)
at com.graphhopper.matching.MapMatchingMain.start(MapMatchingMain.java:69)
at com.graphhopper.matching.MapMatchingMain.main(MapMatchingMain.java:45)
After removing the two subnetwork property arguments
Ok, then it was probably not the reason. BTW: the subnetwork properties remove too small islands, and having a 0 there means no removal.
But as there are only a few parameters you should be able to identify the differences: weighting (default is fastest), vehicle, max visited nodes, gps accuracy, algorithm (should be bidir dijkstra) and I guess all subnetwork properties are 0 in the tests too., so that they don't remove anything.
I can also have a look into the cause
I can also have a look into the cause
Thanks. I will try first but maybe will get back to you.
Interestingly, running unit test testIssue70 and matching the same GPX file from the command line using the entire Serbia map gives different matching results even if sigma and beta are the same.
The problem was that I used mm-uturns1.gpx for the command line and issue-70.gpx for the unit test. The GPS coordinates are the same in both files but the in mm-uturns1.gpx all timestamps are the same, whereas in issue-70.gpx timestamps are 60 s apart.
For mm-uturns1.gpx (with 60 s timestamp differences) the penalizing of routing paths via QueryGraph.enforceHeadingByEdgeId
works as it should but mm-uturns2.gpx still has strange U-turns. I will further investigate this.
whereas in issue-70.gpx timestamps are 60 s apart
Ah yes, I vaguely remember doing that, as it wasn't working (and didn't make sense) with all the timestamps the same.
As an aside, while thinking up something else, I thought of a different method for completely preventing inner-link U-turns: only route from (real) tower nodes to tower nodes. This will obviously prevent inner-link U-turns (which require virtual nodes) and I think it shouldn't affect map-matching: we'd have to take care with the first/last points of the sequence, but others doesn't really need virtual nodes. E.g. in
---T1----X1----T2----X2----T3----X3---T4---
with Ti tower nodes and Xi virtual nodes, we get the same map-matched route going from X1->X2->X3 as we do from T1->T2->T3->T4. That's a simplistic example - in the general case we may have to do something like have two candidates for each edge (i.e. the base/adj node) instead of just the single virtual one. (Assuming it is virtual - some may, of course, already be tower nodes.) I can go into more detail if required.
That said:
Thoughts?
Ah yes, I vaguely remember doing that, as it wasn't working (and didn't make sense) with all the timestamps the same.
If timestamps are the same, then map matching as it is currently implemented doesn't work at all, because then all transitions have zero probability. Actually, it would be better to throw an exception in this case or use the non-normalized metric.
As an aside, while thinking up something else, I thought of a different method for completely preventing inner-link U-turns: only route from (real) tower nodes to tower nodes.
It's an interesting approach, which is simple to implement and should still solve the U-turn problem. However, I would expect that the matching quality gets worse in general because in some cases emission and transition probabilities would be biased:
Actually, it would be better to throw an exception
Or, even better, have a customisable 'maximum allowed speed' between events (e.g. 200km/h). This would fail on not only time diff = 0, but also similarly small ones. And, even better, don't throw an exception, but (optionally) break the sequence here and start a new one. At least, this is how I'm doing it currently. I may try and get my code into a PR today.
I would expect that the matching quality gets worse in general because in some cases emission and transition probabilities would be biased
Those are both good points!
the emission probability metric ...
What about using the same distance in the emission probability (i.e. the perpendicular distance of the snapped point to the GPX point). I think that makes sense in the context of HMM emission probabilities.
the transition [note my correction] probability metric ...
Maybe we could replace the linear distance between GPX tracks with the linear distance between the nodes? I think this makes sense as currently it's really used to favour straight routes (i.e. the shortest), which would still be true if we used tower-to-tower linear distance.
What about using the same distance in the emission probability (i.e. the perpendicular distance of the snapped point to the GPX point). I think that makes sense in the context of HMM emission probabilities.
We would then have the same emission probability for both tower nodes of the edge and hence the Viterbi algorithm would have no clue where we really are on this edge. Moreover, if we replaced the linear distance between GPS positions with the linear distance between the candidate nodes then we would have very low transition probabilities between both tower node candidates of the same edge no matter where the GPS positions actually are. This would most likely lead to worse map matching results, e.g. going back and forth between the tower nodes of same edge in the map matching result.
In general, an HMM tries to find the correct hidden states for known observations. The better our modelling of hidden states reflects reality, the better results we will get. So I think we should stick to directed virtual node candidates.
So I think we should stick to directed virtual node candidates.
We could more easily represent the 'directed nodes' here, if we use the edge based traversal instead (and fix the 4 edges problem with QueryGraph too of course).
We would then have the same emission probability for both tower nodes of the edge and hence the Viterbi algorithm would have no clue where we really are on this edge.
Maybe I should rephrase - in my head, I'm effectively thinking in terms of only allowing complete real edges - that's what I mean by routing between real tower nodes. (Sorry, I think I went down a rabbit-hole in response to your questions.) Anyway, I don't think anything needs to change - we still use the snapped distance in the emission probabilities, and same linear distance - but we simply don't call queryGraph.lookup
, to ensure that we only have real edges. (All of the above is ignoring first/last edges.) I may have missed some crucial point here though ...
This would most likely lead to worse map matching results ... The better our modelling of hidden states reflects reality, the better results we will get
I wonder - is there an easy way we can score our algorithm? We've got unit tests, but they're only pass/fail. I've seen in some papers that they compare the map-matched route with the actual route (including by down-sampling the input GPX track) - this would give us a metric for how 'good' our map-matching actually is. And then we could easily settle such matters = ) @stefanholder have you ever done this sort of error quantification? Or know of any relevant data sets? Maybe a new issue (hopefully I'm not raising too many!).
I think we should stick to directed virtual node candidates
I don't want to sound like a broken record, but as above I'm still not sure a first order HMM with the standard viterbi algorithm will ever be able to handle this. I am most happy to be wrong, and I think it's a good solution if I am = )
but as above I'm still not sure a first order HMM with the standard viterbi algorithm will ever be able to handle this
Instead of nodes we can route with edges and this should solve this. The only problem in my head would be indifferent solutions where the snap is exactly on a junction node.
Instead of nodes we can route with edges and this should solve this.
Agreed - as long as we use directed edges?
The only problem in my head would be indifferent solutions where the snap is exactly on a junction node.
Maybe, in that case, we add all possible edges coming from that junction node, and leave viterbi to choose which is 'best'?
Good news: I managed to prevent/penalize inner-link u-turns using QueryGraph.enforceHeadingByEdgeId
. The GPX files mm-uturns1.txt and mm-uturns2.txt from original forum post are now matched without U-turns. Moreover, all unit tests are successful except one, where I need your help. The changes are not that complicated but needed some time because I had to solve some other issues as well. I will publish this after cleaning up everything.
I've seen in some papers that they compare the map-matched route with the actual route (including by down-sampling the input GPX track) - this would give us a metric for how 'good' our map-matching actually is. And then we could easily settle such matters = ) @stefanholder have you ever done this sort of error quantification? Or know of any relevant data sets? Maybe a new issue (hopefully I'm not raising too many!).
The authors of the paper, our HMM map matching approach is based on, provide a GPS test track and the ground truth path, along with a road network in a custom format: http://research.microsoft.com/en-us/um/people/jckrumm/MapMatchingData/data.htm. Moreover, it's easy to create own GPS test tracks and corresponding ground truth paths. I think setting up a map matching quality benchmark deserves a new issue.
Maybe, in that case [snap is exactly on a junction node], we add all possible edges coming from that junction node, and leave viterbi to choose which is 'best'?
This would work but penalizing inner-link U-turns seems to solve most problems. However, we could try this out in a separate PR.
I just created PR #88 with my changes. I did this as a separate PR because I rewrote the git history of this PR and I didn't want to mess up things. I think we can then close this PR.
Thanks @stefanholder !
I think setting up a map matching quality benchmark deserves a new issue.
Agreed - see #89.
I just created PR #88 with my changes. I did this as a separate PR because I rewrote the git history of this PR and I didn't want to mess up things. I think we can then close this PR.
Nice work! Agreed - will close.
See #70
WIP. Things to note (@stefanholder)
I'm still not sure how the logic is going to work. To clarify, consider
with nodes A,B and a GPX position at X (i.e. the virtual node), and virtual edges v1 and v2. What is being done in this PR is creating two candidates for this GPX point: both with closestNode = X, but one with incoming=v1 and outgoing=v2, and the other with that reversed. When we route, we simply unfavor the incoming/outgoing edge to force the other to be used. E.g. if X is the
from
candidate, then we'd havecandidate 1: incoming=v1 (unfavored), outgoing=v2
candidate 2: incoming=v2 (unfavored), outgoing=v1
We can do similar for the
to
candidate but unfavor the outgoing edge.However, how do we actually prevent inner-link U-turns with this? When X is the
to
timestep, we have a candidateand when it's the
from
timestep we have a candidatein other words the inner-link U-turn is still possible:
@stefanholder - how were you planning to get around this when you said:
Since we don't know the direction of the 'right'
to
candidate we can't enforce the 'right' direction to start with.If you agree it's not resolvable, then I think an 'online' (i.e. step by step) solution might be best: we calculate the best sequence up to the give point, and the we can enforce that as the direction at the start of the route to all other candidates.