kungfoo / geohash-java

Implementation of GeoHashes in java. We try to be/stay compliant to the spec, as far as possible.
Other
981 stars 310 forks source link

GeoHashBoundingBoxQuery returns binary that matches points outside box #31

Closed jonathansds closed 6 years ago

jonathansds commented 6 years ago

Hi there, I'm using geohash dependency 1.3.0 and it's being working out of the box so far! Great library!

I'm implementing a BoundingBox search and even though it seems to be really straight forward implementation, I notice that some points outside the box are matched when using the binary geohash prefix. Actually, I'm doing like this:

WGS84Point point1 = new WGS84Point(10.557597041722232, -35.52832642341309);
WGS84Point point2 = new WGS84Point(-41.76269104573268, -68.00914348298193);
GeoHashBoundingBoxQuery boundingBoxQuery = new GeoHashBoundingBoxQuery(new BoundingBox(point1, point2));

List<GeoHash> containedGeohashes = boundingBoxQuery.getSearchHashes();
Set<String> containedStringGeohashes = new HashSet<>();

for(GeoHash hash : containedGeohashes){
    containedStringGeohashes.add(hash.toBinaryString());
}

for(String significantHash : containedStringGeohashes){
    FunctionUtil.log(TAG, "Searching for " + significantHash);
    Query<MXChallenge> queryChallenges = ofy().load().type(MXChallenge.class).filter("binaryGeohash >=", significantHash).filter("binaryGeohash <", significantHash + "\uFFFD").limit(ConstUtil.limit.MAP_CHALLENGES_PER_BOUNDING_BOX);
    ....
}

The problem is, the points above creates a boundingbox pretty much in Brazil, but when I iterate between the Geohash objects returned from boundingBoxQuery.getSearchHashes() and retrieved their binary representations, these are the strings I get:

001
011

And then when I search for those binaries on my database it matches points in for example Ireland:

0111101011000011001001110001100110101010010101100101111001011001 (51.473854, -9.388135)
0111101011000001101101000011100111011100110011000001100100010101 (51.608044, -10.158703)
0111101011000001110001000000001101100011000110010100111110111111 (51.771931, -10.539617)

Those binaries from my database are also generated using geohash-java 1.3.0 like this:

GeoHash.withBitPrecision(latitude, longitude, 64).toBinaryString();

Please can you advise what I am doing wrong? Thanks again for the awesome library!

kungfoo commented 6 years ago

From the binary strings you get from the database it looks like the prefix match should actually succeed, if they are written as the result of .toBinaryString(). Also check out the combined bounding box of the query, since that one can be substantially larger than your query was:

The combined bounding box of your query is:

BOX(-90.0 -90.0,0.0 90.0)

If you compare with Visualization of lat lon it's clear to see why points from Iceland and Ireland will end up in your results.

I can think of two ways to improve this:

You still are leaving out (if I see correctly after a quick glance) 3/4 of the potential results, so this might still be a useful query, but then again, the bounding box you are trying to cover with geohashes in the first place is already huge.

To get a list of hashes that are slightly more precise but still cover your bounding box, have a look at the class BoundingBoxGeoHashIterator. You can do something like this:

https://github.com/kungfoo/geohash-java/blob/a532342d61fb2b293276d238df1708c4e3e6575b/src/test/java/ch/hsr/geohash/Issue31Test.java

I added code there that shows how to use the bounding box iterator to get more precise hashes that cover your bounding box. It also shows that it will not match points in Ireland.

kungfoo commented 6 years ago

I think this should clear up why you get points from Ireland in your query results.

This makes me think that I might want to rework the BoundingBoxQuery class to merge that there is two ways of doing this.

jonathansds commented 6 years ago

Hi Kungfoo, thanks for the reply!

I implemented it exactly as shown on Issue31Test.java file and for the coordinates provided before it works just fine. When performing some more tests I could notice the problem still persists though. The following code:

double latitudeNorth = 45.51500139095788;
double latitudeSouth = -54.947056898528714;
double longitudeEast = -24.979743641602226;
double longitudeWest = -92.20523347952155;

GeoHash point1 = GeoHash.withBitPrecision(latitudeSouth, longitudeWest, 6);
GeoHash point2 = GeoHash.withBitPrecision(latitudeNorth, longitudeEast, 6);

ImmutableList<GeoHash> containedGeohashes = ImmutableList.copyOf(new BoundingBoxGeoHashIterator(new TwoGeoHashBoundingBox(point1, point2)));;
Set<String> containedStringGeohashes = new HashSet<>();

for(GeoHash hash : containedGeohashes){
    containedStringGeohashes.add(hash.toBinaryString());
}

for(String significantHash : containedStringGeohashes){
    Query<MXChallenge> queryChallenges = ofy().load().type(MXChallenge.class).filter("binaryGeohash >=", significantHash).filter("binaryGeohash <", significantHash + "\uFFFD").limit(ConstUtil.limit.MAP_CHALLENGES_PER_BOUNDING_BOX);
    ...
}

Code above produce the following 18 binary prefixes:

000011
000110
000111
001001
001011
001100
001101
001110
001111
010010
010011
010110
011000
011001
011010
011011
011100
011110

Which matches loads if not all points in Ireland :/

I'm using this boundingbox search as a way to "paginate" points (markers) on a map, so the behavior I'm trying to achieve is that the user performs zoom/scroll on the map then I request the server to retrieve all markers (latitude and longitude) that I have on my database that are inside that area the user is looking at. For the coordinates above, the visible area of the map is all South America continent, but then at the moment this request is retrieving all markers in Ireland as well :/

Is there a way to achieve this behaviour?

Thanks for the patience, I'm really newbie when talking about geolocalization...

jonathansds commented 6 years ago

Please Kungfoo I really need your help. Have been playing around trying to figure out this for the past week. I would really appreciate if you could help me with this. Is it even possible to get just the points inside a boundingbox query excluding any coordinates outside the box?

jonathansds commented 6 years ago

I'm thinking about getting the center of the boundingbox then somehow calculate the distance between the center and one of the corners of the box then perform a GeoHashCircleQuery. Do you think it would be a valid option?

kungfoo commented 6 years ago

Actually, this should be the correct way of doing it. If it does indeed not work, then you might have stumbled across a bug in the BoundingBoxIterator class. This was done a while ago and really needs some re-work. As is the nature of the hashes being sub-divisions of the grid, and the iterator erring on the safe side when covering a bounding box, you will always end up with false positives. You would as well when performing a circle query, since that one is also approximated using rectangular hashes, so you will by definition get results outside of the circle. This may however be acceptable, or the result set is so much smaller than the entire data set, that you can do a second pass, actually calculating the distance (which is an expensive operation) to only end up with points in the circle.

So in your case, I'm not entirely sure, but I'd also have to check the hashes manually...

jonathansds commented 6 years ago

Understand. My biggest problem wouldn't be getting points outside the box but not that much difference (searching on South America and ending up bringing points in Europe). So I will try with GeoHashCircleQuery to see if the results are more precise otherwise I think I will start to consider using another lib just for this specific case as everything else is working fine with geohash-java :)

Thanks for the help! :)

kungfoo commented 6 years ago

It sure seems weird that you are getting hashes centered over Europe, when searching with a bounding box in South America. I bet you are hitting some corner case and the thing just returns wrong values. If we can figure out where and why, I'd be happy to fix it.

jonathansds commented 6 years ago

Humm, cool, I will get this rest of this week to perform some more tests and try to figure out why this is happening then. Hopefully I will find out something.

kungfoo commented 6 years ago

I will also try looking into it, but this is some really old code. :) It also needs a refactor... Especially this iterator.