Use the H3 cell index to create a bounding box and perform a point in bbox operation

uber / h3

Hexagonal hierarchical geospatial indexing system

https://h3geo.org

Apache License 2.0

4.74k stars 455 forks source link

Use the H3 cell index to create a bounding box and perform a point in bbox operation #722

Closed Stophface closed 1 year ago

Stophface commented 1 year ago

With latitude and longitude, there is the possbility to create a bounding box based on xmax/ymax and xmin/ymin. Having coordinates, I can perform a range search to check, if these coordinates are withing the bounding box. Something like

 xmax >= longitude && xmin <= longitude && ymax >= latitude && ymin <= latitude

If all of this is true, I know, my point falls within the bounding box. I wonder if there is similar possibility, using the index of h3. If I define the xmax/ymax and xmin/ymin with the index of the corresponding cell:

topLeftCorner: 8b2d55c256acfff
bottomRightCorner: 8b2d024758b1fff

Could I then use the way the cell index is constructed to perform a similar range search, like with real coordinates? Something like (pseudo code):

point = 8b2d11c1599bfff
if(point[0:4] === topLeftCorner[0:4] && ....

nrabinowitz commented 1 year ago

The order of the indexes does not support this directly. You might be able to do this using local IJ coordinates - see celltolocalij. Note that there are areas of the world (around pentagons) where this may not work well, but in local areas this should be possible.

In general, though, I think the simpler option when you want to check for inclusion in a large region is to have a reverse index - e.g. a table, map, or set of the indexes in the region. Determining if a point is in the area is then just a set inclusion check. It might also be simpler or more efficient to actually check a lat/lng bounding box (based on the cell center lat/lng) and then have a second, more expensive check to determine whether the index is in the region.

Stophface commented 1 year ago

@nrabinowitz Thanks for ypur reply. I am playing around a bit and I am still unsure what is the most efficient way. Currently I am storing locations as cell indexes in a database. I am interested in finding cells indexes that are in a viewport. In order to do that, I am filling the viewport with cell indexes (polygonToCells()) and then I am going through the database, comparing row by row the h3 indexes that is stored in the database with the h3 indexes that fill the viewport. However, this requires comparing two long strings (cell indexes) and I am researching ways how to replace this string = string comparison. I was thinking about performing a x/y bounding box check as well, which should be faster!

dfellis commented 1 year ago

@nrabinowitz Thanks for ypur reply. I am playing around a bit and I am still unsure what is the most efficient way. Currently I am storing locations as cell indexes in a database. I am interested in finding cells indexes that are in a viewport. In order to do that, I am filling the viewport with cell indexes (polygonToCells()) and then I am going through the database, comparing row by row the h3 indexes that is stored in the database with the h3 indexes that fill the viewport. However, this requires comparing two long strings (cell indexes) and I am researching ways how to replace this string = string comparison. I was thinking about performing a x/y bounding box check as well, which should be faster!

So in the database you can use the int64 version of the H3 cell (the string representation is simply a hexadecimal string representing up to 64 bits) as the index and then the query after you do polygonToCells can be a simple parseInt(h3Cell, 16) (or equivalent in your language of choice) to get the 64-bit ints to compare, which should be much faster than a string check.

There's probably other tricks involving larger resolution hexagons and cellToParent that could be faster to but also get some cells that are beyond the size of the viewport that may or may not be worth it, but I think simply using the integer representation of the cells should give you a good perf bump.

Stophface commented 1 year ago

@dfellis is it really that easy? Looking at the answer provided by @nrabinowitz provided to this question https://stackoverflow.com/questions/57810560/how-do-i-convert-an-h3-id-from-hex-to-integer-back-and-forth-in-js it seems there is more to it?

nrabinowitz commented 1 year ago

@dfellis is it really that easy? Looking at the answer provided by @nrabinowitz provided to this question https://stackoverflow.com/questions/57810560/how-do-i-convert-an-h3-id-from-hex-to-integer-back-and-forth-in-js it seems there is more to it?

It's relatively easy if you convert on the BE (assuming some non-JS language) or in the DB rather than in frontend JS. But the main speedup here is probably from an index on the H3 column in the DB.

The bottom line, however, is that H3 doesn't support range queries in a meaningful way for this use case. One option depending on granular your viewport query is - you can often fill with coarse cells, then use this to request fine cells, either by including an additional column indexing the coarse resolution or by doing this in the query (via H3 UDFs in the DB, or by H3 index prefix matching, or by range queries based on the parent index)

dfellis commented 1 year ago

@nrabinowitz is right (again :) ). The speedup I'm expecting this conversion to give you is due to the database being able to use a B-Tree index instead of Hash index (plus a full string comparison for verification purposes of potential matches).

The B-Tree index can also implicitly do range-like querying under the hood, using the min and max H3-as-int64 values from your query to figure out the set of pages the values will be coming from and start pre-loading them from disk before the B-Tree checks complete (though that's a database implementation detail that may or may not be done in whatever database you're using).

In any case, equality checking of integers is basically a single CPU cycle, so it is far faster than the string form. The string representation of H3 is for broad compatibility across languages for data interchange, but the Java and Python H3 bindings expose integer-native APIs for the performance benefits.

I didn't think about the JS integer problems because it looked like python pseudocode in your first post to me, but now I see the triple-equal so I'm not sure what that array-like access syntax is supposed to be from.

Stophface commented 1 year ago

@dfellis and @nrabinowitz thanks for sticking with me. I am developing a mobile app with React-Native, which is a JavaScript Framework. The database I am using is a SQLite database. Because I am communicating with the SQLite database via a react-native library, there is no option for me to use the available h3 bindings for SQLite. I only can use the JavaScript h3 bindings.

The bottom line, however, is that H3 doesn't support range queries in a meaningful way for this use case.

Bummer :)

One option depending on granular your viewport query is - you can often fill with coarse cells, then use this to request fine cells, either by including an additional column indexing the coarse resolution or by doing this in the query (via H3 UDFs in the DB, or by H3 index prefix matching, or by range queries based on the parent index)

That is what I am already implementing. But, there are quite a few edge cases to handle, especially when the parent cells become larger. There are areas in the Viewport which are not covered by parent cells, which need special treatment. Additionally, I am not only working with a rectengular viewport, but also with the geometries of countries... And because H3 fills the geometries "only" with the corresponding cells if the center of a cell falls within the geometry, there are quite a few edge cases... Enlarging the geometry which is filled with cells using https://turfjs.org/docs/#transformTranslate is a solution to this. But, this is a) very expensive on the front end, b) the amount of scaling depends on the h3 resolution (and the geometry) and c) does not work with certain geometries (when the enlarged parts of the same geometrie start to overlap, they sort of mask each other)

nrabinowitz commented 1 year ago

@dfellis and @nrabinowitz thanks for sticking with me. I am developing a mobile app with React-Native, which is a JavaScript Framework. The database I am using is a SQLite database. Because I am communicating with the SQLite database via a react-native library, there is no option for me to use the available h3 bindings for SQLite. I only can use the JavaScript h3 bindings.

The bottom line, however, is that H3 doesn't support range queries in a meaningful way for this use case.

Bummer :)

One option depending on granular your viewport query is - you can often fill with coarse cells, then use this to request fine cells, either by including an additional column indexing the coarse resolution or by doing this in the query (via H3 UDFs in the DB, or by H3 index prefix matching, or by range queries based on the parent index)

because H3 fills the geometries "only" with the corresponding cells if the center of a cell falls within the geometry, there are quite a few edge cases... Enlarging the geometry which is filled with cells using https://turfjs.org/docs/#transformTranslate is a solution to this. But, this is a) very expensive on the front end, b) the amount of scaling depends on the h3 resolution (and the geometry) and c) does not work with certain geometries (when the enlarged parts of the same geometries start to overlap, they sort of mask each other)

My usual approach here is to buffer using H3, not the original geometry - something like (pseudocode)

const cells = new Set(polygonToCells(myPolygon, res));
for (const cell of cells) {
  for (const neighbor of gridDisk(cell, 1)) {
   cells.add(neighbor);
  }
}

Depending on your geometry and how many cells are in the polyfill, this may be faster, and should solve some of the issues you mention.

Stophface commented 1 year ago

@nrabinowitz Ah, that is smart. I will give this a try! I might have to catch a few edge cases where there is an array with ~10k cells on higher resolutions. But this still might be faster than using turf's transformScale()!