graphhopper / map-matching

The map matching functionality is now located in the main repository https://github.com/graphhopper/graphhopper#map-matching
https://www.graphhopper.com/open-source/
Apache License 2.0
785 stars 274 forks source link

map matching 'quality' test suite #89

Open kodonnell opened 7 years ago

kodonnell commented 7 years ago

As discussed here it'd be good to have a way of testing how 'good' our map matching is by comparing it with known results. (I.e. our map-matching on a GPX track gives a certain route, and how close does that align to the actual known route taken?)

Considerations/comments (to direct a PR):

karussell commented 7 years ago

where do we get data of known routes?

There are many taxi tracks from NYC http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml and there are many on OSM itself: http://wiki.openstreetmap.org/wiki/Planet.gpx or GPSies

karussell commented 7 years ago

The performance test suite could be also used for this. E.g. counting the errors and differences of the random routes or using different randomness etc

stefanholder commented 7 years ago

how do we compare with the known route? Compare total distances of each? Or the proportion of the distance which they share common edges on?

We should check to which degree the matched route goes through the same sequence of edge/nodes as the expected route. To this end we could compute the Levenshtein distance as a similarity measure between the expected and the actual node/edge sequence.

stefanholder commented 7 years ago

To this end we could compute the Levenshtein distance

Newson & Krumm suggest a different metric in Chapter 5, which is simpler but probably still sufficient: map matching error = (d_minus + d_plus) / d0

With this definition it is possible to have an error greater than 1. But I don't think that this is a real problem.

Another useful variant would be to use the number of road segments for d_minus, d_plus and d0 instead of their length.

kodonnell commented 7 years ago

Agreed that, assuming 'normal' behaviour, it is probably fine. If it's not 'normal' - e.g. non-sequential - then this could be a very poor metric. A Levenshtein approach may handle these cases better.

That said, we may need to write something similar to, but not, Levenshtein. For example, transpositions are allowed (with the same cost as adding/removing). This makes sense in e.g. spelling detection, but probably not in routes - edges have the additional constraint of connectivity (which words largely don't - I think). So we'd either need to not consider transitions, or treat them differently. Similar applies for changing edges. So, it's possibly that our Levenshtein could end up only really considering additions/subtractions (in the 'normal' case) which is equivalent to the @karussell's suggestion.

As an aside - if we do go with Levenshtein, we should weight be edge length, not just counts.

Another thought - what about the area between the curves? We'd probably have time as a dimension (so area includes differences in time), so that even being at the point isn't necessarily good (unless you're also there at the same time). (Actually, the latter is something we'd need to consider in the above approaches too.) In terms of calculating the area, we can probably come up with a suitable proxy similar to numerical integration - e.g. if both route and ground truth had 100 nodes, we'd just sum up the difference (including time) of each node pair.