mourner / rbush-knn

k-nearest neighbors search (KNN) for RBush
ISC License
212 stars 37 forks source link

Spherical Surface #6

Open phillip-martin opened 7 years ago

phillip-martin commented 7 years ago

From the code below (taken from lines 39-53 of index.js), spherical coordinates may yield funky results.

function compareDist(a, b) {
    return a.dist - b.dist;
}

function boxDist(x, y, box) {
    var dx = axisDist(x, box.minX, box.maxX),
        dy = axisDist(y, box.minY, box.maxY);
    return dx * dx + dy * dy;
}

function axisDist(k, min, max) {
    return k < min ? min - k :
           k <= max ? 0 :
           k - max;
}

1) boxDist, as you probably know, does not support spherical coordinates. Utilizing the Haversine formula in some way for the case of lat/lng could give improved results. 2) axisDist has a longitude wraparound problem. For example, a call to axisDist(-179, 178, 179) will return 357, but -179 is 2 degrees away from the max, 179.

I think there is some cool potential here to add support for spherical coordinates. What do you think?

mourner commented 7 years ago

Thanks for the report! Unfortunately RBush-KNN is not designed to work with spherical coordinates. I'd suggest using https://github.com/darkskyapp/sphere-knn or write your own rbush-sphere-knn library.

phillip-martin commented 7 years ago

Thanks for pointing out https://github.com/darkskyapp/sphere-knn. Unfortunately, I'm seeking out a solution which handles bounding boxes.

I am open to writing an rbush-sphere-knn library. Are there any reasons you are opposed to fitting such functionality into this library? An rbush-sphere-knn library could be identical to this one, with the exception of the boxDist implementation.

mourner commented 7 years ago

RBush is used for many use cases beyond the Earth one (e.g. non-geographical 2D visualizations), and adding another distance metric in a configurable way will probably increase the complexity of the library significantly.

phillip-martin commented 7 years ago

It would be nice if this knn library could have a configurable metric, but you're right about adding complexity. I will update when rbush-sphere-knn is working. Thanks for taking the time to contemplate this.

mourner commented 7 years ago

Mind that for this particular KNN algorithm to work consistently, the custom distance metric should satisfy the condition "the distance to a bounding box is guaranteed to be equal or smaller than the distance to any item contained in it". I'm not sure whether spherical distance satisfies this condition.

Would love to see what you come up with. If a custom metric could be introduced with minimal code changes and no effect on performance, I'd be happy to review a pull request and see whether we could make it a part of this repo.

mourner commented 7 years ago

Thinking about it more, it might be a great idea to make this a part of this repo. Let's look into it!

Thanks to @anandthakker for validating my assumption about suitability of the current KNN approach for the Earth case:

if A is a point outside of region R, and B is a point within R, then the line from B to A must cross the boundary of R at some point X — XA must be <= BA

phillip-martin commented 7 years ago

That's great! Glad to see that this is worthwhile.

I changed the title to omit "Ellipsoid" to avoid any confusion. It seems sufficient to view the earth as a sphere in this case so to avoid any complications of a more accurate model.

if A is a point outside of region R, and B is a point within R, then the line from B to A must cross the boundary of R at some point X — XA must be <= BA

I didn't realize this was a requirement. Are we still unsure if this condition is satisfied for boundaries in the sphere?

mourner commented 7 years ago

@phillip-martin no, that's a proof that the condition is satisfied.

phillip-martin commented 7 years ago

My mistake. Happy to see that's not an issue.

mourner commented 7 years ago

@phillip-martin as a warmup, I solved this problem for points — check out the code here: https://github.com/mourner/geokdbush/. For bounding boxes, the only remaining challenge is to find a good definition of the metric "great circle distance from a point to a bounding box", which is trickier than distance between points.

merlinnot commented 3 years ago

@mourner Hi, I stumbled upon this issue and thought that we could maybe work together to solve it?

For my use case I need only a distance from points (but I need dynamic insertion, so kdbush and flatbush are not an option), but I understand that given the library supports bounding boxes, you'd like to solve both issues at once.

Having read through the thread, I understand that the blocking issue is a definition of "the distance to a bounding box is guaranteed to be equal or smaller than the distance to any item contained in it". Thinking about it, I realized that such a point must be at the edge of the bounding box, so we can simplify the problem to finding the geodesic distance between a point and a line, so that we can take a minimum distance from four edges. Now the question is, how can we calculate the geodesic distance between a point and a line? I found a paper on it and it seems that there's a JavaScript implementation based on it: https://www.npmjs.com/package/geographiclib. This is also visualized here: http://geo.javawa.nl/coordcalc/index_en.html (see second option in the dropdown). They website uses Leaflet, so I though you might like it :wink:

Let me know what you think about it, I'd like to help.