Penalize inner-link U-turns (builds on #83)

stefanholder commented 7 years ago

Things to note

Requires graphhopper/graphhopper#909
mm-uturn2.gpx matches without U-turns with gps_accuracy>=80. Otherwise, there is a U-turn at a real node, which we could address in a separate PR.
I squashed @kodonnell's first two commits from #83 because the second commit fixes the line endings of the first commit.
Added a penalty to the path distance for each unfavored edge contained in the path.
Use non-normalized transition metric because the timeDiff==0 check didn't work with directed candidates and was broken anyway.
Shortest weighting does not use penalties, hence unfavoring virtual edges would not work there.
Implemented logging of candidates and paths. After fixing the remaining broken unit test, it probably makes sense to set the logging level to WARN for MapMatching.java.

When looking at the unit tests I was wondering why the map match results are only checked for the correct sequence of street names but not for the correct sequence of real nodes / real edges. Is there a reason for this?

@karussell, @kodonnell: When finally merging this PR, if you want to squash commits please only squash commits of the same author and please do not squash the entire PR so that it is still clear which commits were contributed by which author.

stefanholder commented 7 years ago

The Travis CI build has a problem with the maven-failsafe-plugin, which should be unrelated to my changes. @karussell, can you please have a look?

karussell commented 7 years ago

When finally merging this PR, if you want to squash commits please only squash commits of the same author and please do not squash the entire PR so that it is still clear which commits were contributed by which author.

I'm sorry. This happened before to @michaz too, which I'm very sorry about. I didn't indent to remove the authorship or something. We could explicitly add this here.

The problem also is that github offer this as default option which I should change now. Also squashing all commits into one is a lot simpler than making this author by author, maybe we should aim for just very few commits per PR and avoid squashing at all.

stefanholder commented 7 years ago

I'm sorry. This happened before to @michaz too, which I'm very sorry about. I didn't indent to remove the authorship or something.

No problem. Actually, it didn't happen to my commits but I know that the github squashing feature exist and I wanted to make you and other maintainers aware of the consequences.

maybe we should aim for just very few commits per PR and avoid squashing at all.

Sure, just let me know if you want me to squash my commits. For reviewing, it is sometimes better to have more commits, e.g. having refactorings in separate commits. This also allows to easily revert certain commits.

karussell commented 7 years ago

Ok. I think the only improvement that squashing does is that a new feature is condensed into as few as possible commits so that it also can be easily reverted. But as I didn't use squashing ~6 months ago I think we can revert to this behaviour without tweaking authorship.

The Travis CI build has a problem with the maven-failsafe-plugin, which should be unrelated to my changes. @karussell, can you please have a look?

Looks like retriggering solved this, but we need a new GH core version deployed before.

kodonnell commented 7 years ago

First - nice work @stefanholder! Looks like it was pretty involved, so thanks for sticking with it. If you'd like, I'm happy to review, though it may be a few days. @karussell - can you let me know whether you'd prefer me to review this, or work on #87?

if you want to squash commits

I still don't know what squashing is, so you're safe from me!

karussell commented 7 years ago

@karussell - can you let me know whether you'd prefer me to review this, or work on #87?

I'd prioritize this one here but this should not be in my power - please do what you think is best or fun :) !

stefanholder commented 7 years ago

However, @stefanholder can you verify you also get the same results as below for mm-uturns2?

I get the same results if I use issue-70.osm.gz as map data. The reason is that the road, the map matcher should match to, is not contained in the map file. The trace matches correctly with the entire Serbia map.

@stefanholder - is it possible to filter the issue 70 OSM data to only include relevant stuff (as per here)?

The size of issue-70.osm.gz is 11 KB, so is this really a concern? If yes, could you do this as another commit to branch issue70 and maybe also include the map data for mm-uturns2? I will then squash this commit into your first commit in this PR.

stefanholder commented 7 years ago

I updated my commits with the changes discussed above and squashed some commits. If you want to see only what has changed you can compare with branch issue70-penalize-paths-v0.1. The comparison doesn't work in the Github UI because both branches diverged but works in IntelliJ, for example.

michaz commented 7 years ago

I would like to get feedback from @michaz, do you (still) think this could/should be solved differently?

No no, I I still like it. My concerns were more with putting other distance-like things (like time) instead of distance into the transition probabilities.

Just so I understand this correctly, and for future efforts: The reasons we are unfavoring the wrongly-directed virtual edges, rather than just not inserting them, are that

1a.) QueryGraph is the way it is. 2a.) We want to allow inner-link U-turns, just with low probability.

Similarly, the special case for "matching to a real node" is

1b.) because QueryGraph is the way it is. 2b.) because in the cases this happens, it may give better performance

3.) I think there may be some interference between unrelated things. We use one QueryGraph for everything, so every candidate introduces inner-link U-turn possibilities on routes between completely different candidates.

So, if we could tell QueryGraph to always create a pair of directed edge-matches (2 * (virtual node + 2 virtual edges)), it would fix 3.) and be more straight-forward, but at the cost of missing 2a) and 2b)?

karussell commented 7 years ago

2a.) We want to allow inner-link U-turns, just with low probability.

Yes, I think it is this case

Similarly, the special case for "matching to a real node" is

Here kind of both apply. We can also improve QueryGraph, but avoiding to create virtual nodes is good for performance especially for multiple hundreds of points. But we probably should test it before using it as an argument :)

We use one QueryGraph for everything, so every candidate introduces inner-link U-turn possibilities on routes between completely different candidates.

Here @stefanholder uses the 'unfavouring' stuff to clear and avoid interfering I think, still making QueryGraph stateful removes possibility to do matching on multiple threads (with the same QueryGraph).

So, if we could tell QueryGraph to always create a pair of directed edge-matches

You mean always creating virtual nodes&edges?

stefanholder commented 7 years ago

The reasons we are unfavoring the wrongly-directed virtual edges, rather than just not inserting them, are that 1a.) QueryGraph is the way it is. 2a.) We want to allow inner-link U-turns, just with low probability.

The main reason is that during candidate generation we don't know yet what the wrongly-directed virtual edges are. This is determined by the Viterbi algorithm after the router computed the penalized path lengths. Moreover, both directed candidates can be valid choices even without any U-turn if the road is not a dead end.

Similarly, the special case for "matching to a real node" is 1b.) because QueryGraph is the way it is. 2b.) because in the cases this happens, it may give better performance

I would say because we first wanted to address inner-link U-turns in this PR and not U-turns at real intersections. However, this shouldn't be hard to do in another PR if really needed.

3.) I think there may be some interference between unrelated things. We use one QueryGraph for everything, so every candidate introduces inner-link U-turn possibilities on routes between completely different candidates.

Additional virtual nodes in the QueryGraph from other candidates shouldn't be a problem because

Since virtual nodes retrieved by findNClosest are perpendicular to GPS positions, we don't have additional candidates for any GPS position
inner-link U-turns can only happen at the beginning or end of a path but not at virtual nodes from other candidates (otherwise this path wouldn't be the fastest or shortest including penalties from unfavored edges)

michaz commented 7 years ago

You mean always creating virtual nodes&edges?

I mean instead of the QueryGraph doing this

and then doing the direction-thing in the map matching code...

...maybe have the QueryGraph do this

yes

and then do nothing more in the map matching code.

And yes, treat all the matches like this. No special case for being EPSILON next to a real node.

michaz commented 7 years ago

Later, I mean. Not now.

stefanholder commented 7 years ago

...maybe have the QueryGraph do this

Interesting idea, this would make virtual edges truly unidirectional. From my understanding of Graphhopper internals, this would mean splitting the flags field into two flags fields - one for each direction.

stefanholder commented 7 years ago

But as you wrote before, this would completely prevent inner-link U-turns instead of penalizing these. Since penalizing U-turns is more powerful than preventing U-turns, I think we should stick with penalties for inner-link U-turns.

karussell commented 7 years ago

Interesting. I also favour penalizing u-turns over completely avoiding them. Also we would introduce a bit complexity due to two candidates '7a' and '7b' instead of one?

We could improve clarity in a later PR via fixing QueryGraph to use just two bidirectional edges (instead of 4 unidir. edges) and use the edge based traversal mode, then we could use the TurnWeighting to avoid u-turns without a special virtual edge handling in the map matching core I guess. The costs would be that the full algorithm will be roughly 2 times slower but for motor vehicles we have to consider turn costs anyway.

michaz commented 7 years ago

Allright, I'm just saying: The whole QueryGraph-virtual-node-virtual-edge-business was just what I found to be there when I did the prototype and noticed that GraphHopper generally "thinks undirectedly", (which I don't, I always think of road segments as two opposing directed edges, and storing them as one data record is just a compressed representation of that, which should ideally be abstracted away in the layer directly above data storage, wherever possible).

I did not intend it to model U-turns, this is just what happened.

If I had found a way in QueryGraph to get positions on directed-edge-like-things, I would have immediately used that.

stefanholder commented 7 years ago

then we could use the TurnWeighting to avoid u-turns without a special virtual edge handling in the map matching core I guess

Also with TurnWeighting we need two candidates per virtual node, one for each direction. This is because we don't know the correct direction at candidate generation (see above). Hence, I'm not sure what the benefit of TurnWeighting would be. Still it would be nice to support all traversal modes for map matching.

I did not intend it to model U-turns, this is just what happened.

Sure, that's what I thought.

stefanholder commented 7 years ago

@karussell the following question from above is still open, can you please answer?

When looking at the unit tests I was wondering why the map match results are only checked for the correct sequence of street names but not for the correct sequence of real nodes / real edges, which is suboptimal when there are no street names such as here. Is there a reason for this?

karussell commented 7 years ago

but not for the correct sequence of real nodes / real edges, which is suboptimal when there are no street names such as here. Is there a reason for this?

I'm not 100% sure about the reason besides being simpler to debug. But it could be the following: although nodes&edges are stable if we do not change the source map, they could still change if something changes while import and checking just streets is more robust. The best solution which I would also like to include in the public web API would be to use OSM nodes which are okay to debug and relative stable (OSM way IDs are not that stable).

stefanholder commented 7 years ago

I updated my commits with the changes discussed above. The previous commits can be found in branch issue70-penalize-paths-v0.2.

karussell commented 7 years ago

Please let me know a +1 and I'll merge @michaz & @kodonnell

stefanholder commented 7 years ago

@karussell, your last comment got thumbs up from both @michaz and @kodonnell (there is no email notification for this). Not sure if this is what you meant with +1.

karussell commented 7 years ago

Thanks!

there is no email notification for this

How ugly. They could at least give a notification in the web UI ... this is what discourse does (in general they have a much better notification system ...)

karussell commented 7 years ago

Merged - thanks again all involved :) !

stefanholder commented 7 years ago

How ugly.

Yes, it's probably better that all reviewers approve a review via github pull request reviews.

kodonnell commented 7 years ago

As an aside, as #73 was merged before this (thanks @karussell), we've got the following two measurement results. (We can't easily go back further.)

commits:
--------------------------

3cd6476 [2016-12-20] Performance measurement suite (#73)
d3cf21e [2016-12-23] Merge pull request #88 from stefanholder/issue70-penalize-paths

measurements:
-------------

                              3cd6476             d3cf21e             
                              -------             -------             
location_index_match.max      10.756949           5.571446            
location_index_match.mean     0.3402709212        0.19900762400000002 
location_index_match.min      0.001734            0.001277            
location_index_match.sum      1701.354606         995.03812           
map_match.max                 1645.143509         2409.088762         
map_match.mean                127.85747168        274.09888926        
map_match.min                 1.564703            3.390409            
map_match.sum                 12785.747168        27409.888926        
measurement.count             5000                5000                
measurement.seed              123                 123                 
measurement.time              19071               36710               
measurement.totalMB           1529                1524                
measurement.usedMB            50                  37

So, roughly speaking, the locationIndex is a little under twice as fast, but the map-matching is a little over twice as slow.

karussell commented 7 years ago

This looks strange. Did you run it twice and were the results consistent? Maybe there are some problems in the measurement suite. E.g. why should the location index be faster?

kodonnell commented 7 years ago

Did you run it twice and were the results consistent?

No - I'd assumed all the warm-up and multiple testing would make it OK - as below, I was wrong = ) Not sure if it's a bug ... though maybe there could have been varying demands on the system (just my laptop) or some such. Anyway, I've run it multiple times, and consistently got results similar to below:

commits:
--------------------------

3cd6476 [2016-12-20] Performance measurement suite (#73)
d3cf21e [2016-12-23] Merge pull request #88 from stefanholder/issue70-penalize-paths

measurements:
-------------

                              3cd6476             d3cf21e             
                              -------             -------             
location_index_match.max      5.239315            5.63816             
location_index_match.mean     0.1950378346        0.19318890019999999 
location_index_match.min      0.001261            0.001284            
location_index_match.sum      975.189173          965.944501          
map_match.max                 1619.534054         2392.89043          
map_match.mean                127.55285619        271.12588989        
map_match.min                 1.563381            3.044777            
map_match.sum                 12755.285619        27112.588989        
measurement.count             5000                5000                
measurement.seed              123                 123                 
measurement.time              18018               35912               
measurement.totalMB           1526                1522                
measurement.usedMB            37                  37

That said, about half the time the used MB is like above, and the other half, the latest commit is 50% higher.

So ... locationIndex is the same (as expected) though map matching is slower (which is expected, given we're increasing the number of candidates). For now, I'm guessing we're not too worried by the increased runtime (as it comes with a new feature)?

karussell commented 7 years ago

No - I'd assumed all the warm-up and multiple testing would make it OK

It should. I expect the measurement suite is suboptimal somewhere

E.g. the while loop is something that I do not like much. With the new algorithm we should make every matching working I think and throw an error for problems.

For now, I'm guessing we're not too worried by the increased runtime (as it comes with a new feature)?

It depends a bit and if we understand the underlying issue. We should try as hard as possible to avoid slow down. Otherwise we end up in powerful but slow software at some point.

which is expected, given we're increasing the number of candidates

Are we increasing them?

kodonnell commented 7 years ago

Are we increasing them?

My understanding is that every virtual node (a single candidate before) becomes (two) directed candidates (virtual node + direction). Assuming every node is a virtual one, then that's twice as many candidates, and four times as many transitions (which is likely to be the slow part, as it's the routing). The viterbi processing will also be slowed down due to the increased number of candidates (and may require more memory to store everything in the maps). At this stage, I'm assuming we're only seeing a 2.5x slowdown (as opposed to 4x or more) because not every candidate is virtual.

the while loop is something that I do not like much. With the new algorithm we should make every matching working I think and throw an error for problems.

Do you mean this one? Agreed, with #87 we should (?) hopefully never fail - at worst, we get returned a list of single point sequences. That said, another option is to tweak MiniPerfTest to (optionally) catch exceptions and count/report them in the final stats, and then we could exclude the while loop. I'd only do that if it was useful in other situations.

karussell commented 7 years ago

Assuming every node is a virtual one, then that's twice as many candidates, and four times as many transitions

Ups, indeed. BTW: We should put the one-to-many "cache" to make this a lot faster (I hope). Was this meant with #81 ?

Do you mean this one?

yes

Agreed,with #87 we should (?) hopefully never fail

Then I'd prefer the procedure you describe (throwing vs. counting errors) and then (a slightly modified) measurement suite could also act as a quality indicator like in #89

kodonnell commented 7 years ago

Was this meant with #81 ?

I'm not sure what you mean, sorry.

Then I'd prefer

OK, I'll try to remember to tweak it as part of #87.

karussell commented 7 years ago

OK, I'll try to remember to tweak it as part of #87

Better in a separate PR

stefanholder commented 7 years ago

We should use a one-to-many Dijkstra algorithm, which only aborts after all target nodes have been found. I think this was meant with #82. @karussell: Is this also what you meant with one-to-many "cache" above?

karussell commented 7 years ago

@karussell: Is this also what you meant with one-to-many "cache" above?

I meant exactly this, yes. But I'm not sure of intentions of the issue creator :)

kodonnell commented 7 years ago

But I'm not sure of intentions of the issue creator :)

The intent of #82 was to use DijkstraOneToMany.

graphhopper / map-matching

Penalize inner-link U-turns (builds on #83) #88