pysal / tobler

Spatial interpolation, Dasymetric Mapping, & Change of Support
https://pysal.org/tobler
BSD 3-Clause "New" or "Revised" License
144 stars 30 forks source link

h3fy not completely filling polygon #146

Closed jtamerius closed 2 years ago

jtamerius commented 3 years ago

Hey All,

I've noticed that h3fy function does not completely fill the passed polygon. It looks like it will leave out hexagons that are less than 50% inside the source geodataframe. Here is the call to the function that I used to create the hex grid:

    grid= h3fy(source, clip=True, resolution= 8) # note that same thing occurred when clip=False

The figure below shows the hex-filled source polygon and the source boundary (black line).

image

knaaptime commented 3 years ago

thanks for raising this. I think the fix would be to first buffer the input features by a small margin, generate hexes using that larger geometry, then clip back to the original input. I'm not entirely sure whether that's handled better inside the function or as pre/postprocessing, but seems reasonable we could maybe add a buffer keyword or something

jtamerius commented 3 years ago

I think it should be handled internally if clip=True since the user is expecting the polygons to completely/precisely filled by the hex grid.

martinfleis commented 3 years ago

Agree. This is a bug on our side, we should fix it under the hood.

knaaptime commented 3 years ago

yeah i dunno. I wouldn't call it a bug, (and id' argue it's not really even on our side, but what the h3 library has decided "fills" the unary union of the input, though, I agree the way the docstring is written the user probably expects a completely filled polygon). Whether the issue occurs or not is largely a function of the hexgrid resolution the user requests. If you want a low resolution hexgrid (large hexes), then you'll probably cover an area larger than your study area, sometimes substantially so (and that's why we provided the clip argument)

image

If you want a high resolution hexgrid, then you may get some imperfect edges, analogous to rasterization, where you can't clip pixels so there's always a jagged edge along the perimeter of a polygon that's been rasterized, and the extent of jaggedness depends on the raster's resolution. So i think the short term fix is to rewrite the docstring to make clear we're passing through output from h3. For the actual issue, maybe the easiest way to solve internally would probably be to just generate a hexgrid using the input's total bounds (or something slightly larger) then clip that grid using the input. Otherwise we'd have to work out some heuristic scheme to figure out the appropriate buffer size for the input data akin to #118, or buffer incrementally until the face is filled. If we go that route, my question would be whether that should be the default behavior if you pass clip=True, or if we want an additional argument like force_fill or something

the rasterization analogy also makes me think we could include some additional options for how boundary hexes are handled e.g. include any hex that touches the input vs only those whose centroid touches the input (i.e. the all_touched arg from rasterio). In that case, in the example above passing clip=False would give you essentially the opposite "problem" with hexes that extend beyond the border of your gdf. Each of those could be useful in different scenarios so it's probably worth providing all of those options

ljwolf commented 3 years ago

I'd definitely model it on rasterio.mask.mask. +1 on an all_touched option.