Closed warnes closed 9 months ago
One somewhat crude option (but effective and avoids some extra st_ calls) would be the following:
"Your mileage may vary" with this approach, but hopefully it helps
One somewhat crude option (but effective and avoids some extra st_ calls) would be the following:
- If desired h3 resolution is R, first use R+2 for generating the indexes for "covering" the polygon
- For each of the cell indexes identified in step 1, find the unique set of indexes at resolution R
- Review what you end up with, it might be a bit "extra chunky" and have some cells that barely touch the original shape - but gets around the challenge of losing coverage cells that did not get included due to cell centroid not within the shape
- Review what you end up with in the case where the original polygon/multipolygon contains holes, you may have some custom considerations on when you want to retain or eliminate holes
"Your mileage may vary" with this approach, but hopefully it helps
This approach would work, but caveat emptor:
The alternative I generally use, which is computationally expensive but maybe cheaper than above, but not as memory intensive:
polygonToCells
gridDisk(cell, 1)
of every sampled cell and add to a candidate setThis catches a bunch of cases that buffering and other approaches might not (imagine for example a polygon with a long thin peninsula sticking out, which might miss cells even at R+2 if narrow enough).
Actually, since the current polyfill algorithm starts by tracing outlines and then iteratively fills the shape toward the center it's possible to adapt the first step (outline calculation) to use another predicate (e.g. intersection vs centroid) and keep the second half of the algo (filling inward) unchanged.
At least that's the approach I used in h3o, and so far the results are looking good (example), but I could have missed some corner cases.
Actually, since the current polyfill algorithm starts by tracing outlines and then iteratively fills the shape toward the center it's possible to adapt the first step (outline calculation) to use another predicate (e.g. intersection vs centroid) and keep the second half of the algo (filling inward) unchanged.
At least that's the approach I used in h3o, and so far the results are looking good (example), but I could have missed some corner cases.
That's basically the plan for the multi-mode option, unless the alternative algorithm I've been considering turns out to be faster (I have a plan for polygonToCompactCells
that would allow results to be streamed instead of returned in a giant memory block, but it might be slower). But that's only different from what I'm describing if you're implementing/updating the algo in the library itself, not if you're a library consumer.
(I have a plan for polygonToCompactCells that would allow results to be streamed instead of returned in a giant memory block, but it might be slower).
Ha interesting!
I also started to think about it when I was trying to optimize the polyfill, but didn't get very far.
In the end I was able to mitigate the memory usage by streaming the cells as they are generated using iterators (now I can polyfill Russia at resolution 10 using ~100M of RAM only) so it was less of an issue. But if you want to compact it afterward you need to load the whole set in memory, so it would be great to have the ability to generate compacted set from the get-go 🙂
What was the approach/algorithm you had in mind?
What was the approach/algorithm you had in mind?
Taking this as a good excuse to document my thinking here 😁
polygonToCompactCells
R
, Memory for candidate set, memory for output set, status flag (START
, IN_PROGRESS
, DONE
)START
:
R
, test cell convex hull against polygonR
, test cell according to polyfill mode (intersection, containment, centerpoint, etc)DONE
IN_PROGRESS
Some notes here:
compactCells
(I think - I'd need to check whether compactCells
can take multi-res input). The other option would be to always test a cell's children within the loop and check for the all-passing case.DONE
. This allows for streaming chunks of output with very little memory usage.uint64
would be the index of the last cell in the set, so we don't have to send that separately122 + 6 * R + 1
(122 for the base cells, 6 for each res before R, because we cut out the parent before adding the children, and 7 for the cells at R), so max 214 * 64 bits ~= 1.7Kb.uint64
slots as a bitmap for the base cells, bringing this size down to 94 * 64 bits ~= 752 bytes, but it's probably not worth the extra complexity.polygonToCells
with a call to polygonToCompactCells
followed by uncompactCells
. One downside here is that we'll need memory for the compact set + memory for the uncompacted set. I think it's possible to output uncompacted cells directly, but maintaining state is a little trickier, because you'd add cellToChildren(cell, R)
to the output set and you might not have enough room, so you'd have to pass the child iterator back to the caller as well. One advantage(?) of this approach is that we have to implement polygonContainsPolygon
and polygonIntersectsPolygon
in order to run the algorithm, so adding support for these modes should be trivial.
Improvement from @ajfriend - the candidate set can be a single cell! Simpler, and the memory requirement is down to almost nothing:
polygonToCompactCells(polygon, res, *currentCell, *out)
currentCell
is H3_NULL
:
currentCell
to base cell 0currentCell
exists and output set is not full:
R
, test cell convex hull against polygoncurrentCell
to first child and continueR
, test cell according to polyfill mode (intersection, containment, centerpoint, etc)currentCell
to next cell:currentCell
to next sibling and breakcurrentCell
to next base cell and break, or H3_NULL
and break if no more base cellscurrentCell
to cellToParent(cell, res - 1)
No need for a flag, if currentCell
is H3_NULL
you're done, otherwise you can pass the current cell back in with more memory.
Really interesting, thanks for sharing 🙏
I'll try to implement this approach in h3o when I get the bandwidth to do so, curious to see how it perform in practice.
One part I'm not sure to understand is this branch:
Set
currentCell
to next cell:
- Set
currentCell
tocellToParent(cell, res - 1)
If a cell at R=1 intersect with the shape, we will explore its children at R=2 Once we've tested the last sibling we skip the first (next sibling) and second branch (next base cell) and we insert back the parent cell at R=1 in the currentCell => aren't we stuck in a loop here, or did I miss something?
That's why we have the while true
loop - this branch doesn't break, so it continues the loop and looks for the next sibling of the parent. That way we can walk all the way back up the hierarchy until we get to an ancestor that has a next sibling.
There's almost certainly a nicer way to write this, but that was the first version I thought of.
Update on this (sorry, I've taken over this issue as a thread on the new polyfill algo):
On the plus side (relevant to the original ticket) I had to implement cellBoundaryInsidePolygon
for the algorithm, so containment and intersection modes should be a very easy lift once this is merged.
The additional modes have been implemented in https://github.com/uber/h3/pull/796
TLDR: Allow specification of 'contained', 'intersects', 'covers' for
polygonToCells
.Details:
For my project, I need to be able to convert arbitrary multipolygons to the set of hexes which completely cover the enclosed area.
The current implementation of
polygonToCells
only generates a list of cells whose centroids are located inside of the (multi)polygon.My current workaround is:
edge_length(res)
usingst_buffer
polygonToCells
to generate the list of cell idsst_overlaps
andst_contains
to flag all the cells that cover the original multipolygon.Unfortunately, this is somewhat slow, so I'm looking for a faster way to accomplish this.
(Note that the algorithm described at https://github.com/uber/h3/issues/608#issuecomment-1137852267 won't work if any area of the original polygon is too small/narrow to contain a cell centroid.).