uber / h3

Hexagonal hierarchical geospatial indexing system
https://h3geo.org
Apache License 2.0
4.83k stars 459 forks source link

Add additional modes for polygonToCells #775

Closed warnes closed 9 months ago

warnes commented 1 year ago

TLDR: Allow specification of 'contained', 'intersects', 'covers' for polygonToCells.

Details:

For my project, I need to be able to convert arbitrary multipolygons to the set of hexes which completely cover the enclosed area.

The current implementation of polygonToCells only generates a list of cells whose centroids are located inside of the (multi)polygon.

My current workaround is:

  1. expand the original multipolygon by edge_length(res) using st_buffer
  2. use polygonToCells to generate the list of cell ids
  3. Use st_overlaps and st_contains to flag all the cells that cover the original multipolygon.

Unfortunately, this is somewhat slow, so I'm looking for a faster way to accomplish this.

(Note that the algorithm described at https://github.com/uber/h3/issues/608#issuecomment-1137852267 won't work if any area of the original polygon is too small/narrow to contain a cell centroid.).

KirkDybvik commented 12 months ago

One somewhat crude option (but effective and avoids some extra st_ calls) would be the following:

  1. If desired h3 resolution is R, first use R+2 for generating the indexes for "covering" the polygon
  2. For each of the cell indexes identified in step 1, find the unique set of indexes at resolution R
  3. Review what you end up with, it might be a bit "extra chunky" and have some cells that barely touch the original shape - but gets around the challenge of losing coverage cells that did not get included due to cell centroid not within the shape
  4. Review what you end up with in the case where the original polygon/multipolygon contains holes, you may have some custom considerations on when you want to retain or eliminate holes

"Your mileage may vary" with this approach, but hopefully it helps

nrabinowitz commented 12 months ago

One somewhat crude option (but effective and avoids some extra st_ calls) would be the following:

  1. If desired h3 resolution is R, first use R+2 for generating the indexes for "covering" the polygon
  2. For each of the cell indexes identified in step 1, find the unique set of indexes at resolution R
  3. Review what you end up with, it might be a bit "extra chunky" and have some cells that barely touch the original shape - but gets around the challenge of losing coverage cells that did not get included due to cell centroid not within the shape
  4. Review what you end up with in the case where the original polygon/multipolygon contains holes, you may have some custom considerations on when you want to retain or eliminate holes

"Your mileage may vary" with this approach, but hopefully it helps

This approach would work, but caveat emptor:

The alternative I generally use, which is computationally expensive but maybe cheaper than above, but not as memory intensive:

This catches a bunch of cases that buffering and other approaches might not (imagine for example a polygon with a long thin peninsula sticking out, which might miss cells even at R+2 if narrow enough).

grim7reaper commented 12 months ago

Actually, since the current polyfill algorithm starts by tracing outlines and then iteratively fills the shape toward the center it's possible to adapt the first step (outline calculation) to use another predicate (e.g. intersection vs centroid) and keep the second half of the algo (filling inward) unchanged.

At least that's the approach I used in h3o, and so far the results are looking good (example), but I could have missed some corner cases.

nrabinowitz commented 12 months ago

Actually, since the current polyfill algorithm starts by tracing outlines and then iteratively fills the shape toward the center it's possible to adapt the first step (outline calculation) to use another predicate (e.g. intersection vs centroid) and keep the second half of the algo (filling inward) unchanged.

At least that's the approach I used in h3o, and so far the results are looking good (example), but I could have missed some corner cases.

That's basically the plan for the multi-mode option, unless the alternative algorithm I've been considering turns out to be faster (I have a plan for polygonToCompactCells that would allow results to be streamed instead of returned in a giant memory block, but it might be slower). But that's only different from what I'm describing if you're implementing/updating the algo in the library itself, not if you're a library consumer.

grim7reaper commented 12 months ago

(I have a plan for polygonToCompactCells that would allow results to be streamed instead of returned in a giant memory block, but it might be slower).

Ha interesting!

I also started to think about it when I was trying to optimize the polyfill, but didn't get very far.

In the end I was able to mitigate the memory usage by streaming the cells as they are generated using iterators (now I can polyfill Russia at resolution 10 using ~100M of RAM only) so it was less of an issue. But if you want to compact it afterward you need to load the whole set in memory, so it would be great to have the ability to generate compacted set from the get-go 🙂

What was the approach/algorithm you had in mind?

nrabinowitz commented 11 months ago

What was the approach/algorithm you had in mind?

Taking this as a good excuse to document my thinking here 😁

Some notes here:

nrabinowitz commented 11 months ago

One advantage(?) of this approach is that we have to implement polygonContainsPolygon and polygonIntersectsPolygon in order to run the algorithm, so adding support for these modes should be trivial.

nrabinowitz commented 11 months ago

Improvement from @ajfriend - the candidate set can be a single cell! Simpler, and the memory requirement is down to almost nothing:

No need for a flag, if currentCell is H3_NULL you're done, otherwise you can pass the current cell back in with more memory.

grim7reaper commented 11 months ago

Really interesting, thanks for sharing 🙏

I'll try to implement this approach in h3o when I get the bandwidth to do so, curious to see how it perform in practice.

One part I'm not sure to understand is this branch:

Set currentCell to next cell:

  • Set currentCell to cellToParent(cell, res - 1)

If a cell at R=1 intersect with the shape, we will explore its children at R=2 Once we've tested the last sibling we skip the first (next sibling) and second branch (next base cell) and we insert back the parent cell at R=1 in the currentCell => aren't we stuck in a loop here, or did I miss something?

nrabinowitz commented 11 months ago

That's why we have the while true loop - this branch doesn't break, so it continues the loop and looks for the next sibling of the parent. That way we can walk all the way back up the hierarchy until we get to an ancestor that has a next sibling.

There's almost certainly a nicer way to write this, but that was the first version I thought of.

nrabinowitz commented 11 months ago

Update on this (sorry, I've taken over this issue as a thread on the new polyfill algo):

On the plus side (relevant to the original ticket) I had to implement cellBoundaryInsidePolygon for the algorithm, so containment and intersection modes should be a very easy lift once this is merged.

nrabinowitz commented 9 months ago

The additional modes have been implemented in https://github.com/uber/h3/pull/796