Closed stefanholder closed 7 years ago
The Travis CI build has a problem with the maven-failsafe-plugin, which should be unrelated to my changes. @karussell, can you please have a look?
When finally merging this PR, if you want to squash commits please only squash commits of the same author and please do not squash the entire PR so that it is still clear which commits were contributed by which author.
I'm sorry. This happened before to @michaz too, which I'm very sorry about. I didn't indent to remove the authorship or something. We could explicitly add this here.
The problem also is that github offer this as default option which I should change now. Also squashing all commits into one is a lot simpler than making this author by author, maybe we should aim for just very few commits per PR and avoid squashing at all.
I'm sorry. This happened before to @michaz too, which I'm very sorry about. I didn't indent to remove the authorship or something.
No problem. Actually, it didn't happen to my commits but I know that the github squashing feature exist and I wanted to make you and other maintainers aware of the consequences.
maybe we should aim for just very few commits per PR and avoid squashing at all.
Sure, just let me know if you want me to squash my commits. For reviewing, it is sometimes better to have more commits, e.g. having refactorings in separate commits. This also allows to easily revert certain commits.
Ok. I think the only improvement that squashing does is that a new feature is condensed into as few as possible commits so that it also can be easily reverted. But as I didn't use squashing ~6 months ago I think we can revert to this behaviour without tweaking authorship.
The Travis CI build has a problem with the maven-failsafe-plugin, which should be unrelated to my changes. @karussell, can you please have a look?
Looks like retriggering solved this, but we need a new GH core version deployed before.
First - nice work @stefanholder! Looks like it was pretty involved, so thanks for sticking with it. If you'd like, I'm happy to review, though it may be a few days. @karussell - can you let me know whether you'd prefer me to review this, or work on #87?
if you want to squash commits
I still don't know what squashing is, so you're safe from me!
@karussell - can you let me know whether you'd prefer me to review this, or work on #87?
I'd prioritize this one here but this should not be in my power - please do what you think is best or fun :) !
However, @stefanholder can you verify you also get the same results as below for mm-uturns2?
I get the same results if I use issue-70.osm.gz as map data. The reason is that the road, the map matcher should match to, is not contained in the map file. The trace matches correctly with the entire Serbia map.
@stefanholder - is it possible to filter the issue 70 OSM data to only include relevant stuff (as per here)?
The size of issue-70.osm.gz is 11 KB, so is this really a concern? If yes, could you do this as another commit to branch issue70 and maybe also include the map data for mm-uturns2? I will then squash this commit into your first commit in this PR.
I updated my commits with the changes discussed above and squashed some commits. If you want to see only what has changed you can compare with branch issue70-penalize-paths-v0.1. The comparison doesn't work in the Github UI because both branches diverged but works in IntelliJ, for example.
I would like to get feedback from @michaz, do you (still) think this could/should be solved differently?
No no, I I still like it. My concerns were more with putting other distance-like things (like time) instead of distance into the transition probabilities.
Just so I understand this correctly, and for future efforts: The reasons we are unfavoring the wrongly-directed virtual edges, rather than just not inserting them, are that
1a.) QueryGraph
is the way it is.
2a.) We want to allow inner-link U-turns, just with low probability.
Similarly, the special case for "matching to a real node" is
1b.) because QueryGraph
is the way it is.
2b.) because in the cases this happens, it may give better performance
3.) I think there may be some interference between unrelated things. We use one QueryGraph
for everything, so every candidate introduces inner-link U-turn possibilities on routes between completely different candidates.
So, if we could tell QueryGraph to always create a pair of directed edge-matches (2 * (virtual node + 2 virtual edges)), it would fix 3.) and be more straight-forward, but at the cost of missing 2a) and 2b)?
2a.) We want to allow inner-link U-turns, just with low probability.
Yes, I think it is this case
Similarly, the special case for "matching to a real node" is
Here kind of both apply. We can also improve QueryGraph, but avoiding to create virtual nodes is good for performance especially for multiple hundreds of points. But we probably should test it before using it as an argument :)
We use one QueryGraph for everything, so every candidate introduces inner-link U-turn possibilities on routes between completely different candidates.
Here @stefanholder uses the 'unfavouring' stuff to clear and avoid interfering I think, still making QueryGraph stateful removes possibility to do matching on multiple threads (with the same QueryGraph).
So, if we could tell QueryGraph to always create a pair of directed edge-matches
You mean always creating virtual nodes&edges?
The reasons we are unfavoring the wrongly-directed virtual edges, rather than just not inserting them, are that 1a.) QueryGraph is the way it is. 2a.) We want to allow inner-link U-turns, just with low probability.
The main reason is that during candidate generation we don't know yet what the wrongly-directed virtual edges are. This is determined by the Viterbi algorithm after the router computed the penalized path lengths. Moreover, both directed candidates can be valid choices even without any U-turn if the road is not a dead end.
Similarly, the special case for "matching to a real node" is 1b.) because QueryGraph is the way it is. 2b.) because in the cases this happens, it may give better performance
I would say because we first wanted to address inner-link U-turns in this PR and not U-turns at real intersections. However, this shouldn't be hard to do in another PR if really needed.
3.) I think there may be some interference between unrelated things. We use one QueryGraph for everything, so every candidate introduces inner-link U-turn possibilities on routes between completely different candidates.
Additional virtual nodes in the QueryGraph from other candidates shouldn't be a problem because
You mean always creating virtual nodes&edges?
I mean instead of the QueryGraph
doing this
and then doing the direction-thing in the map matching code...
...maybe have the QueryGraph
do this
and then do nothing more in the map matching code.
And yes, treat all the matches like this. No special case for being EPSILON
next to a real node.
Later, I mean. Not now.
...maybe have the QueryGraph do this
Interesting idea, this would make virtual edges truly unidirectional. From my understanding of Graphhopper internals, this would mean splitting the flags field into two flags fields - one for each direction.
But as you wrote before, this would completely prevent inner-link U-turns instead of penalizing these. Since penalizing U-turns is more powerful than preventing U-turns, I think we should stick with penalties for inner-link U-turns.
Interesting. I also favour penalizing u-turns over completely avoiding them. Also we would introduce a bit complexity due to two candidates '7a' and '7b' instead of one?
We could improve clarity in a later PR via fixing QueryGraph to use just two bidirectional edges (instead of 4 unidir. edges) and use the edge based traversal mode, then we could use the TurnWeighting to avoid u-turns without a special virtual edge handling in the map matching core I guess. The costs would be that the full algorithm will be roughly 2 times slower but for motor vehicles we have to consider turn costs anyway.
Allright, I'm just saying: The whole QueryGraph
-virtual-node-virtual-edge-business was just what I found to be there when I did the prototype and noticed that GraphHopper generally "thinks undirectedly", (which I don't, I always think of road segments as two opposing directed edges, and storing them as one data record is just a compressed representation of that, which should ideally be abstracted away in the layer directly above data storage, wherever possible).
I did not intend it to model U-turns, this is just what happened.
If I had found a way in QueryGraph
to get positions on directed-edge-like-things, I would have immediately used that.
then we could use the TurnWeighting to avoid u-turns without a special virtual edge handling in the map matching core I guess
Also with TurnWeighting we need two candidates per virtual node, one for each direction. This is because we don't know the correct direction at candidate generation (see above). Hence, I'm not sure what the benefit of TurnWeighting would be. Still it would be nice to support all traversal modes for map matching.
I did not intend it to model U-turns, this is just what happened.
Sure, that's what I thought.
@karussell the following question from above is still open, can you please answer?
When looking at the unit tests I was wondering why the map match results are only checked for the correct sequence of street names but not for the correct sequence of real nodes / real edges, which is suboptimal when there are no street names such as here. Is there a reason for this?
but not for the correct sequence of real nodes / real edges, which is suboptimal when there are no street names such as here. Is there a reason for this?
I'm not 100% sure about the reason besides being simpler to debug. But it could be the following: although nodes&edges are stable if we do not change the source map, they could still change if something changes while import and checking just streets is more robust. The best solution which I would also like to include in the public web API would be to use OSM nodes which are okay to debug and relative stable (OSM way IDs are not that stable).
I updated my commits with the changes discussed above. The previous commits can be found in branch issue70-penalize-paths-v0.2.
Please let me know a +1 and I'll merge @michaz & @kodonnell
@karussell, your last comment got thumbs up from both @michaz and @kodonnell (there is no email notification for this). Not sure if this is what you meant with +1.
Thanks!
there is no email notification for this
How ugly. They could at least give a notification in the web UI ... this is what discourse does (in general they have a much better notification system ...)
Merged - thanks again all involved :) !
How ugly.
Yes, it's probably better that all reviewers approve a review via github pull request reviews.
As an aside, as #73 was merged before this (thanks @karussell), we've got the following two measurement results. (We can't easily go back further.)
commits:
--------------------------
3cd6476 [2016-12-20] Performance measurement suite (#73)
d3cf21e [2016-12-23] Merge pull request #88 from stefanholder/issue70-penalize-paths
measurements:
-------------
3cd6476 d3cf21e
------- -------
location_index_match.max 10.756949 5.571446
location_index_match.mean 0.3402709212 0.19900762400000002
location_index_match.min 0.001734 0.001277
location_index_match.sum 1701.354606 995.03812
map_match.max 1645.143509 2409.088762
map_match.mean 127.85747168 274.09888926
map_match.min 1.564703 3.390409
map_match.sum 12785.747168 27409.888926
measurement.count 5000 5000
measurement.seed 123 123
measurement.time 19071 36710
measurement.totalMB 1529 1524
measurement.usedMB 50 37
So, roughly speaking, the locationIndex is a little under twice as fast, but the map-matching is a little over twice as slow.
This looks strange. Did you run it twice and were the results consistent? Maybe there are some problems in the measurement suite. E.g. why should the location index be faster?
Did you run it twice and were the results consistent?
No - I'd assumed all the warm-up and multiple testing would make it OK - as below, I was wrong = ) Not sure if it's a bug ... though maybe there could have been varying demands on the system (just my laptop) or some such. Anyway, I've run it multiple times, and consistently got results similar to below:
commits:
--------------------------
3cd6476 [2016-12-20] Performance measurement suite (#73)
d3cf21e [2016-12-23] Merge pull request #88 from stefanholder/issue70-penalize-paths
measurements:
-------------
3cd6476 d3cf21e
------- -------
location_index_match.max 5.239315 5.63816
location_index_match.mean 0.1950378346 0.19318890019999999
location_index_match.min 0.001261 0.001284
location_index_match.sum 975.189173 965.944501
map_match.max 1619.534054 2392.89043
map_match.mean 127.55285619 271.12588989
map_match.min 1.563381 3.044777
map_match.sum 12755.285619 27112.588989
measurement.count 5000 5000
measurement.seed 123 123
measurement.time 18018 35912
measurement.totalMB 1526 1522
measurement.usedMB 37 37
That said, about half the time the used MB is like above, and the other half, the latest commit is 50% higher.
So ... locationIndex is the same (as expected) though map matching is slower (which is expected, given we're increasing the number of candidates). For now, I'm guessing we're not too worried by the increased runtime (as it comes with a new feature)?
No - I'd assumed all the warm-up and multiple testing would make it OK
It should. I expect the measurement suite is suboptimal somewhere
E.g. the while loop is something that I do not like much. With the new algorithm we should make every matching working I think and throw an error for problems.
For now, I'm guessing we're not too worried by the increased runtime (as it comes with a new feature)?
It depends a bit and if we understand the underlying issue. We should try as hard as possible to avoid slow down. Otherwise we end up in powerful but slow software at some point.
which is expected, given we're increasing the number of candidates
Are we increasing them?
Are we increasing them?
My understanding is that every virtual node (a single candidate before) becomes (two) directed candidates (virtual node + direction). Assuming every node is a virtual one, then that's twice as many candidates, and four times as many transitions (which is likely to be the slow part, as it's the routing). The viterbi processing will also be slowed down due to the increased number of candidates (and may require more memory to store everything in the maps). At this stage, I'm assuming we're only seeing a 2.5x slowdown (as opposed to 4x or more) because not every candidate is virtual.
the while loop is something that I do not like much. With the new algorithm we should make every matching working I think and throw an error for problems.
Do you mean this one? Agreed, with #87 we should (?) hopefully never fail - at worst, we get returned a list of single point sequences. That said, another option is to tweak MiniPerfTest to (optionally) catch exceptions and count/report them in the final stats, and then we could exclude the while loop. I'd only do that if it was useful in other situations.
Assuming every node is a virtual one, then that's twice as many candidates, and four times as many transitions
Ups, indeed. BTW: We should put the one-to-many "cache" to make this a lot faster (I hope). Was this meant with #81 ?
Do you mean this one?
yes
Agreed,with #87 we should (?) hopefully never fail
Then I'd prefer the procedure you describe (throwing vs. counting errors) and then (a slightly modified) measurement suite could also act as a quality indicator like in #89
Was this meant with #81 ?
I'm not sure what you mean, sorry.
Then I'd prefer
OK, I'll try to remember to tweak it as part of #87.
OK, I'll try to remember to tweak it as part of #87
Better in a separate PR
We should use a one-to-many Dijkstra algorithm, which only aborts after all target nodes have been found. I think this was meant with #82. @karussell: Is this also what you meant with one-to-many "cache" above?
@karussell: Is this also what you meant with one-to-many "cache" above?
I meant exactly this, yes. But I'm not sure of intentions of the issue creator :)
But I'm not sure of intentions of the issue creator :)
The intent of #82 was to use DijkstraOneToMany.
Things to note
When looking at the unit tests I was wondering why the map match results are only checked for the correct sequence of street names but not for the correct sequence of real nodes / real edges. Is there a reason for this?
@karussell, @kodonnell: When finally merging this PR, if you want to squash commits please only squash commits of the same author and please do not squash the entire PR so that it is still clear which commits were contributed by which author.