kungfoo / geohash-java

Implementation of GeoHashes in java. We try to be/stay compliant to the spec, as far as possible.
Other
981 stars 310 forks source link

Code Example or API for Simple Nearby Search #29

Closed jonathansds closed 7 years ago

jonathansds commented 7 years ago

Hi, I've been searching everywhere for a clear example of how to use your lib to search for nearby locations. I mean, I just need to implement the following situation:

Please could someone advice me or give me any start point to search about it? Thanks a lot!

kungfoo commented 7 years ago

II'll hack up a sample (we're certainly missing some samples) for this. Until then, here's a quick explanation:

To find points that are potentially within a 100km radius around a point, you'd do something like this:

The point here is that calculating the distance between any two sample points is expensive. You filter out a ton of points by simply matching the prefix of the hash.

The hash is effectively a tree of subdivisions of the 2d range of coordinates and leaves always share the prefix of their parent.

I hope this clears it up a bit.

jonathansds commented 7 years ago

Hi Kungfoo,

Thanks for replying. I still didn't get the correct idea of "precision" :( sorry for my ignorance about the subject.

About the steps, I think I understood well. The problem is I can't find BoundingBoxCircleQuery. I am using the version 1.0.13 so most similar class I can see would be GeoHashCircleQuery which has only one constructor which accepts a WGS84Point so I'm a bit confused.

Please could you provide some lines of code as an example? Thanks so much for your patience.

kungfoo commented 7 years ago

That's actually the one I meant. :) May I suggest you use version 1.1.0 instead? It contains some bugfixes... Or do you have to use the old version for some reason? I'll check and backport the fix if necessary.

jonathansds commented 7 years ago

No no, I'd love to use the latest version. I'm just using the 1.0.13 version cause it was the latest one I could get on Maven repository. I tried to generate the jar from the source code of version 1.3.0 with no success so that's why I'm using the 1.0.13 instead. Could you provide the jar or maven dependence for the latest version ?

Thanks so much

jonathansds commented 7 years ago

Oh Sorry, I've just seen the version 1.3.0 on maven and I could get it successfully! So should I use GeoHashCircleQuery which receives WGS84Point in its constructor? Is fo, to get an instance of WGS84Point I just need the latitude and longitude so I wouldn't need to create any GeoHash, right?

kungfoo commented 7 years ago

It should be there: https://mvnrepository.com/artifact/ch.hsr/geohash/1.3.0

kungfoo commented 7 years ago

I'm hacking up a sample as we speak. I think it will clear things up for you.

kungfoo commented 7 years ago

I created a branch with the sample code on it. You can see the sample code in this test class:

https://github.com/kungfoo/geohash-java/commit/3ebab20014ca6e80bce138f200e559368cc59923#diff-c94e5ca203089fb9a4c84665a474f1ea

It creates 1'000'000 points and then uses the GeoHashCircleQuery to narrow them down to the ones close to a radius. If you run the code, you see that there's points that are in the bounding boxes of the query, but are further away than 100km. This is expected, as the bounding box is a rectangle and you're covering a circle with it. It still prunes the vast majority of the points without using VincentyGeodesy.distanceInMeters() which is computationally expensive (if you have many points). You could also do a string prefix match in a database if you store the hashes there.

If you want to run the tests on your machine do:

mvn test
kungfoo commented 7 years ago

I updated the test so it actually checks back with the actual results to make sure all points that are true positives are in the collection of the ones matched by the query (including false positives).

On my machine (just as an example) doing the math for 1M points takes at least 1s. Using the query takes ~30ms.

Check out the branch: https://github.com/kungfoo/geohash-java/tree/feature/add-sample-of-circle-query

jonathansds commented 6 years ago

Hi Kungfoo,

First, thanks so much for the example code!! That was exactly what I was looking for! Great work, Thanks!

I implemented my code based on your example and it works fine. Now I have just one question, I noticed that the method contains from GeoHashCircleQuery is not a simple "contains" like any Java List but instead it does have some logic inside to match the geohashes. I don't want to query all the objects I have in my database then iterate them checking if it's within GeoHashCircleQuery.searchHashes cause at some stage in the future my database can have millions of objects in this table so I was wondering, If I get the list (searchHashes) from the query and and use it to query my database, should I match the whole hash string or just the start of it? And if it's just the start of the hash, how many characters should I consider? For example, could I do something like:

List models = "all models in my database where model.geohash is contained in GeoHashCircleQuery.searchHashes";

or

List models = "all models in my database where model.geohash.substring(0,NUMBER) is contained in GeoHashCircleQuery.searchHashes"; ?

It might sounds a bit "newbie" but I've just started learning about geohashes so I'm not sure how to match the hashes from GeoHashCircleQuery with the hashes I have in my database.

Thanks for the patience.

jonathansds commented 6 years ago

Hi again Kungfoo, Just one more question, how can I retrieve the list of geohash (string) from the GeoHashCircleQuery? In my database I have the hash from each latitude/longitude like this:

Lat=49.167297, Long=20.131718, GeoHash=u2wrdnekr02y

From GeoHashCircleQuery I can get a list of GeoHash then how to get the hash representation (just like above) from a GeoHash object?

Thanks so much.

kungfoo commented 6 years ago

You can match the prefix in the database, however the characters are suboptimal for this, since each character maps to 5 bits (or subdivisions of the coordinate space).

You get the encoded hash as a String with: GeoHash#toBase32. However, this method only works if the precision is a multiple of 5 (which still can help you prune a massive amount of the points in your database, though). If you want to get around the multiples of 5 limitation, you can store GeoHash#toBinaryString() in the database. The prefix matching will still work.

For example: Two points that are nearby might have the (12bit long hashes, for brevity):

hash point a:   110011000011
hash point b:   110011000110
bounding box:   110011000

A bounding box containing both, with have the same prefix.

jonathansds commented 6 years ago

If I opt for going for the multiple of 5 precision limitation, how can I create instances of WGS84Point with multiple of 5 precision to create a GeoHashCircleQuery with it?

This is how I'm doing now:

GeoHashCircleQuery circleQuery = new GeoHashCircleQuery(new WGS84Point(latitude, longitude), ConstUtil.limit.NEARBY_DISTANCE);

    List < GeoHash > containedGeohashes = circleQuery.getSearchHashes();

    List < String > containedStringGeohashes = new ArrayList<>();`

    for(GeoHash hash : containedGeohashes){
        containedStringGeohashes.add(hash.toBase32());
    }

    Query < ChallengeMW > queryChallenges = ofy().load().type(ChallengeMW.class).filter("geohash IN", containedStringGeohashes);`

But as you said, it's throwing an exception as the hashes don't have a multiple of 5 precision.

In my database I'm storing the hash created from http://geohash.org/.

jonathansds commented 6 years ago

I tried:

GeoHashCircleQuery circleQuery = new GeoHashCircleQuery(GeoHash.withCharacterPrecision(latitude, longitude, 6).getPoint(), ConstUtil.limit.NEARBY_DISTANCE);

and

GeoHashCircleQuery circleQuery = new GeoHashCircleQuery(GeoHash.withBitPrecision(latitude, longitude, 60).getPoint(), ConstUtil.limit.NEARBY_DISTANCE);

But keep receiving:

java.lang.IllegalStateException: Cannot convert a geohash to base32 if the precision is not a multiple of 5.

So I think GeoHashCircleQuery is just ignoring it.

kungfoo commented 6 years ago

GeoHashCircleQuery is ignoring it, because it tries to find the smallest hashes/bounding boxes that cover the circle of the query radius. If you take away 3-4 subdivisions the bounding box gets substantially larger, so you'll end up with more false positives. If you want to go this way, you can take the two methods longValue(long) and `fromLongValue(long, significantbits) to create a larger hash with less bits of precision (which then can be a multiple of 5). Then use that to create the query.

If you want to get around this, then use 'binary' precision, i.e. the hashes as a long with a bit-wise operation or as a long String with a prefix match.

jonathansds commented 6 years ago

Yesterday I ended with the following approach:

Geohash geohash = Geohash. withCharacterPrecision(latitude, longitude,12); String geohashString = geohash.toBase32();

Then worked with geohashString according to the distance I wanted. I mean, matching the prefix of the string according to its precision. For example,

Database:

Name | geohash | id Model 1 | gc7x9813vx2r | 1 Model 2 | gc7x8840suhr | 2 Model 3 | gc30psvp0zgr | 3

Then I want to get all models within 100km radius of this point (53.244664, -6.140530).

`Geohash geohash = Geohash. withCharacterPrecision(53.244664, -6.140530, 12); String geohashString = geohash.toBase32().substring(0, 3); //3 characters for around 100km of precision

ofy().load().type(Model.class).filter("geohash >=", geoHashString).filter("geohash <", geoHashString + "\uFFFD");`

Then it will match only Model 1 and Model 2 which start with "gc7" so they are within 100km radius approximately.

Following this table for precision: https://en.wikipedia.org/wiki/Geohash

I actually could simplify the steps to:

String geohashString = Geohash. withCharacterPrecision(53.244664, -6.140530, 3).toBase32();

I didn't do any performance test yet though but I don't think it'll get much slower. Anyway, if the performance test "fails" I will implement the same but using binaryString instead.

What do you think? Should be fine yea?

kungfoo commented 6 years ago

It wouldn't get any slower since the result would be the same.

However, you are simply creating a bounding box of approximately 100km size. This might be good enough, but the circle query class creates up to four hashes that cover a circle, since the subdivisions might be such that your query point is right next to the border of a bounding box. If so, the ~100km size is going be in the wrong direction, so to speak. You might miss points that are pretty nearby, but just over the border of the bounding box. So you'll probably have to use more than one hash to query the database. You should be able to simply truncate the hashes until there's multiples of 5 bits and then dump them all into a Set<String> (there might be duplicates after trimming off bits) and then use the contents of the Set for the query.

jonathansds commented 6 years ago

Would you mind to provide me some code showing how I could truncate the hashes from GeoHashCircleQuery until they get to have multiples of 5 bits size?

        GeoHashCircleQuery circleQuery = new GeoHashCircleQuery(new WGS84Point(latitude, longitude), ConstUtil.limit.NEARBY_DISTANCE);
    List<GeoHash> containedGeohashes = circleQuery.getSearchHashes();
    Set<String> containedStringGeohashes = new HashSet <>();

    for(GeoHash hash : containedGeohashes){
        //Truncate the hashe here//
        containedStringGeohashes.add(hash.toBase32());
    }
kungfoo commented 6 years ago

I would do something along the lines:

private Set<GeoHash> truncateHashes(GeoHashCircleQuery query) {
        Set<GeoHash> result = new HashSet<>();
        for(GeoHash hash: query.getSearchHashes()) {
            int bitsToRemove = hash.significantBits() % 5;
            String truncatedBinaryHash = hash.toBinaryString().substring(0, hash.significantBits() - bitsToRemove);
            result.add(GeoHash.fromBinaryString(truncatedBinaryHash));
        }
        System.out.println(result);
        return result;
    }

As you will see, you will end up with less than 4 hashes in the set most times.

jonathansds commented 6 years ago

Hi Kungfoo, Sorry for the delay, I just had time to implement and test this solution now. So I implemented the way you suggested and it works like a charm. I tested with 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100 KMs and everything works just fine. I also noticed that with 40Km of range the significant bits were always being multiple of 5 so the truncate wasn't needed for 40Km distance. Once I don't have a strict requirement about the distance I'm using 40km from now.

Just wondering, can I assume the hashes will always have a multiple of 5 bits when the query is created with 40km distance for nearby range?