added find radial method to address #65

kodonnell commented 8 years ago

See discussion at #65.

Notes:

this is my first pull request (ever), so apologies if I've done something silly! Feel free to let me know how things work best for you guys, as I've got no idea.
I'm also new to Java.
I doubt many people will need the minDistance feature (though I do, for cell network stuff where we only know a radius as opposed to a point). It's easy to keep, so I figure there's no harm.
I've had to tweak the tests to get them working. I think they're more "correct" now. (Generally they're to do with setting the "correct" measurementSigmaError.) However, please check!
I'm not sure how fast it is. I imagine it'll be comparable to the old method (assuming similar measurementSigmaError), as even for a maxRadius of zero, it'll still search 9 tiles.
it's probably not 100% accuracte in that it's probably possible for e.g. an edge to have a minimum distance less than minRadius (so excluded), but for part of the edge to be outside minRadius (so it should be included).
similarly, it can probably be refactored further down so that instead of relying on the closest edge distance (which presumably requires looping through all line segments of and egde and calculating their distance), to just loop through until it meets the first line segment that satisfies the condition.

@karussell, am I correct with the last two points?

kodonnell commented 8 years ago

Ah, I forgot to mention that I had to make a couple of GH methods public/protected instead of private (e.g. findNetworkEntriesSingleRegion) - that's why it's failing, I think. I'm not sure how to "merge" that with GH itself ... @karussell?

karussell commented 8 years ago

I'm not sure about the exact purpose of this PR. Can you summarize what it does?

And why always adding to queryResults - wouldn't it increase endlessly, shouldn't it be limited to a certain number?

kodonnell commented 8 years ago

The method findEdgesWithinDistance finds all edges within a certain distance of a point. This is to replace findNClosest which:

has a misleading name
doesn't find all edges within measurementSigmaError as one would expect (as per #65)
is less efficient (in theory?) for larger radii as it tests all tiles within a rectangular bounding box, as opposed to all tiles within the radial distance..

It also seems to give more reasonable results (based on the test suite examples).

And why always adding to queryResults - wouldn't it increase endlessly, shouldn't it be limited to a certain number?

Isn't that also the behaviour of findNClosest?

With map matching, if one has a measurementSigmaError of 100m, then one wants to find all edges within 100m of the GPX entry. That's what this does (I hope!) - it will add all edges within 100m. So, if the user e.g. sets measurementSigmaError to 1000km, then yes, they'll get a very large list. However, that's up to the user, in the same way it's up to the user to set a single tile in the LocationIndex (a rather bad idea), if they wish.

Anyway, it would be easy to limit it to the top N matches as it already returns the edges sorted by distance. (We could optimise it further to e.g. break as soon as it's found N.) This is probably a nice feature, as excessive candidates slow the map-matching down quite considerably ... so it allows the user to balance the most thorough approach (e.g. testing all candidates) with speed (e.g. only testing the top three candidates.) Would you like me to add it?

karussell commented 8 years ago

has a misleading name

Yes, this should be fixed

doesn't find all edges within measurementSigmaError as one would expect (as per #65)

When I think about it: it should return all edges, if the starting points are in the reach (first and second iteration) which we can expect for the most map matching use cases. So what it mainly does: it does a location lookup and then adds edges via BreadthFirstSearch until a certain distance, which should be more efficient compared to do just location lookups. As lookups can contain duplicates and a graph traversal not.

The only problem at the moment are case where there are no entries found within the first&second iteration, but we could just increase this iteration count and I'm unsure what triggered your other changes. E.g. why is XFirstSearchCheck still necessary if you already have all the points?

And something which we have to fix elsewhere: implement a basic lookup and map matching performance measurements to see how changes performs. Identical to what we've done for normal routing already.

Isn't that also the behaviour of findNClosest?

Hmmh, not sure. Where does the new method overwrite the entry if it is closer? See the old "// overwrite old edge with current"

kodonnell commented 8 years ago

When I think about it: it should return all edges, if the starting points are in the reach (first and second iteration) which we can expect for the most map matching use cases

I disagree - I think it should return all edges, if the starting points are within the specified GPS error. This is a superset of your statement, and covers additional cases (i.e. outside first/second iteration). In most map matchin use cases (with small GPS error), it will still only ever test the first/second iteration, so will be equivalent.

So what it mainly does: it does a location lookup and then adds edges via BreadthFirstSearch until a certain distance, which should be more efficient compared to do just location lookups. As lookups can contain duplicates and a graph traversal not.

The only problem at the moment are case where there are no entries found within the first&second iteration, but we could just increase this iteration count and I'm unsure what triggered your other changes. E.g. why is XFirstSearchCheck still necessary if you already have all the points?

I don't know = ) The current method simply increases the size of the rectangle, doing lookups as it goes. However, it only carries out two iterations, at most, so doesn't handle larger GPS error - which was the problem for me. Initially, I could have just increased the iteration count, as you say. All I've done is make this dynamic (i.e. it keeps iterating until it's reached the bounds), and optimised it to be radial - e.g. if you've got bounds of 10km, you don't need to do location lookups on the corner tiles of the 20kmx20km bounding box. Other than that, I just follow the existing example (hence why XFirstSeachCheck is still there, etc.)

A graph traversal sounds much better - find the closest node to the query, then iterate out until you're out of bounds. Any example code I can look at for graph traversal from a point?

Where does the new method overwrite the entry if it is closer?

Ah, right. I just ignored duplicate edges, as I thought an edge was and edge. But I think I see now that while the edge may be the same, the query distance may be closer (depending which tile it came from, etc.). I can (easily) implement this, but I'd prefer to do the graph traversal above.

karussell commented 8 years ago

Any example code I can look at for graph traversal from a point?

it is already there: XFirstSearchCheck

But this is also not as easy as it sounds ;)

karussell commented 8 years ago

I disagree - I think it should return all edges, if the starting points are within the specified GPS error.

Yes, extending the loop should be easy. Probably we need to split this PR and take careful steps for a better location lookup in general.

I think this will be important and could lead to better readable code, faster lookup and more flexible approach. E.g. we should limit by radius and by a certain number of 'snaps' and recognize #57. Also the previous lookup always included minimal one snap, even if it was out of the specified radius, although I'm not sure why this was required.

All these changes I would schedule after the 0.8 release as we need to get people started with the new map matching algorithm and also #15 and #66 are very important. And before we can change more here, we need more tests (as indicated via 'I'm not sure'). Otherwise I would fear that we can break something. And as 'fear' is never good sign for quality (!), we need to increase the test coverage for the lookup. This is of course not your job, but if you would have time ... :)

kodonnell commented 8 years ago

Ok, I'm happy to leave it until after 0.8 release . @stefanholder's refactoring in #66 might actually help - I was also playing with on-the-fly lookups in #60.

This is of course not your job, but if you would have time ... :)

OK, I'll do what I can - I'm on holiday this week, but playing round with some unit tests might be relaxing enough (I hope!). Should I create a separate issue to discuss what tests you'd like (in addition to #68)?

Shall I close this PR, for now?

karussell commented 8 years ago

@stefanholder's refactoring in #66 might actually help

Is now merged

I'm on holiday this week, but playing round with some unit tests might be relaxing enough (I hope!)

I'm not entirely sure ;)

Should I create a separate issue to discuss what tests you'd like (in addition to #68)?

You can reuse #68 :)

Shall I close this PR, for now?

Probably only if you create a new issue summarizing like 'Take careful steps for a better location lookup in general', but we can also keep it open

stefanholder commented 7 years ago

@karussell: Regarding your previous comment

So what it [findNClosest] mainly does: it does a location lookup and then adds edges via BreadthFirstSearch until a certain distance

What do you mean with certain distance, is this returnAllResultsWithin?

I'm not sure if a BreadthFirstSearch really happens until the search distance. Because of this line it seems to me that the graph traversal only checks immediate neighbors of all nodes returned by location lookup.

karussell commented 7 years ago

What do you mean with certain distance, is this returnAllResultsWithin?

Yes

I'm not sure if a BreadthFirstSearch really happens until the search distance

This is indeed confusing. If we speak about the location index of the core then we create cells of a certain area (QuadTree) and feed every cell with the nodes that are within the boundaries. (Keep in mind that one such cell is not rectangular on earth). Now we could avoid feeding nodes that are reachable by other nodes within the cell, that was the reason I used BFS to save storage space but it turned out that the location index is small enough and we can avoid the graph traversal, of course we should make this more explicit in the code.

Another complex topic of the location index is when no junction node falls inside a cell but still the edge crosses it and it is explained here https://github.com/graphhopper/graphhopper/issues/232 and here https://github.com/graphhopper/graphhopper/pull/221

kodonnell commented 7 years ago

Oh dear, I just deleted my comment! In short, I've had to reimplement for my specific use case (donuts with e.g. radii of 30km and hence many lookups - and lots of data so they need to be really fast). I noticed both of the above, and hence the approach I take is

get all nodes from the location index (but add a slightly wider buffer region)
for each node
- if it is within bounds, add it, and all of it's edges (if we haven't already)
- else, add each of it's edges (and adj nodes) if a) we haven't already tested that edge and b):
- (cheap test) the adj node is within bounds, or, if this fails
- (expensive test) the edge is within bounds (i.e. an edge point - pillar node? - is within bounds, or the line between two edge points intersects the circular radius).

This removes the need for XFirstSearchCheck, etc., and seems more straightforward, as well as maybe catching a few more edge cases (hah, good pun) and a bit faster. However, both (I think) still fail to catch some edges. E.g. if we have a 50m search radius and and edge which passes through it, but whose nodes are 10km away (which is possible - and the 10km depends on index configuration), then it may not be found. I tried some approaches to fix this (e.g. increasing initial location index search region, or starting a BFS search at each candidate node, and continuing until we reach a node which is e.g. more than 1km out of bounds), however I think the 'best' may be to create an edge-based index (as opposed to a node-based one). This will not only be simpler (and, in theory, much faster), but will probably benefit other GH development. @karussell - has this been tried before? If so, and you think it a good idea, it's probably for a new thread ...

@karussell - shall we close this PR and start a new issue (or discussion) around possibilities for a better location lookup?

karussell commented 7 years ago

however I think the 'best' may be to create an edge-based index (as opposed to a node-based one). This will not only be simpler

I currently do not see how. We avoid duplicate checks already (?) and so we just check the edges and nodes that we need. The only thing I see is that with the edge based stuff we could get rid of "graph exploration" (set node & iterate) and fetch the EdgeIteratorState objects directly, we'd have to try if this results in something faster.

but whose nodes are 10km away

But then the bresenham should make its work. Do you have a test case reproducing the issue you have (in mind)?

kodonnell commented 7 years ago

Looks like I have misunderstood - will a node be added to the a tile in the index if one of its edges is in that tile (but the node isn't)? Is that how bresenham is used? If that is the case, then the current node index is (after graph exploration) effectively an edge index.

We avoid duplicate checks already (?)

At one stage I thought we didn't - I think I checked and found we do (since the BFS only searches edges or current node and goes no further).

I currently do not see how.

I was thinking of some sort of quadtree. To build, we iterate through every edge, and add the edge to each tile it passes through (splitting the tile when it has too many edges). (Maybe this is what's already happening anyway, just saving nodes not edges?) Then the lookup of a given tile can simply return the edge IDs instead of node IDs, and, as you say, we avoid the graph exploration stuff - and the code may be simpler.

karussell commented 7 years ago

will a node be added to the a tile in the index if one of its edges is in that tile (but the node isn't)? Is that how bresenham is used?

Yes.

Sometimes bresenham does not work 100% precise (see one of the linked issues) and so it could be that the edge is in the cell although it is only crossing a neighbour cell so there is optimization potential but nothing bad.

kodonnell commented 7 years ago

Closing in favor of graphhopper 994.

graphhopper / map-matching

added find radial method to address #65 #67