uber / h3-py

Python bindings for H3, a hierarchical hexagonal geospatial indexing system
https://uber.github.io/h3-py
Apache License 2.0
808 stars 130 forks source link

Find minimum cell that contains a set of points #185

Open gpolov-personal opened 3 years ago

gpolov-personal commented 3 years ago

Give a set of lat/long points, how could you find the minimum size cell that contains all the points?

I have come up with a simple algorithm that first extracts the 4 "boundary" points (min latitude, min longitude, max latitude, max longitude) and then iterate over them obtaining their h3 parent ids until the 4 are equal,is there any other method that does that faster?

dfellis commented 3 years ago

So your current approach has a flaw where certain data it will incorrectly require a larger hexagon than actually needed:

IMG_20210410_162626

This hexagon doesn't contain any of the 4 boundary points but does contain all of the actual lat, lng points within.

To answer your question, though, depends on how flexible you are on the definition of the hexagon that "contains" the data. @nrabinowitz has a great demo on this fractal-like partial containment of the indexing system

hex_fractal

If you're okay with the hexagon containment actually having the shape listed above, you can use the Bit format of the H3 Index itself to determine which resolution "contains" the data, based on this fractal shape.

First you index all of the points in res 15, then you use something like h3ToComponents to extract the base cell (res 0) value and the child values (res 1 to 15) and store them. Then take the first set of components from the first index and put them into a new array of base sell and resolution values, and set the resolution field to 15. Now iterate through each of the remaining indexes starting from the base cell until you run across the first resolution that doesn't match what's stored. Then set that resolution and all resolutions after in your tracking data structure to 7 (invalid) and set the resolution value to one higher (so if the difference was found in resolution 9, set it to 8. If ever the base cell doesn't match, you cannot contain the points in a single H3 index and you should error out.

This is parent-child containment, not actual containment with the hexagon in question. But if you want the actual containment for the hexagon in question, now you do one last thing. You take the resolution you found in the algorithm above and you re-index all of the points again. If they all match the hexagon you found, you're done, if not, then it is most likely one resolution up, so try that one and see if everything matches. It is possible if one of the points of the fractal-like hexagon shape crossed multiple actual parents if it was on the very edge of an edge hexagon. In this case, you have to simply re-index at higher and higher resolutions until you actually find a match, but at least you know it is between 0 and the resolution the first algorithm found, so there is less total work necessary.

gpolov-personal commented 3 years ago

First of all thank you very much for your extensive explanation. I am ok with this (non)containment. My goal is to create an index for sets of points (actually they are paths) so when searching over which trajectories lie in near or cross specific places I can filter them out using the H3 index. The thing is that I think I expressed badly the concept of the "boundary", I meant actual points of the set as the "boundaries", not the corners of the convex parallelogram that contains the points: WhatsApp Image 2021-04-11 at 15 34 08

So I have to iterate through 4 points regardless the number of points in the set. I am using the Python API, I cannot see any h3_to_components so for now I am just iterating through the 4 points until I find a common ancestor of the 4 (using h3_to_parent(cell_id, resolution) getting smaller and smaller resolutions (15,14,13,12,11,10,9,8,7...)).

Anyway, thank you very much again for your help