zachasme / h3-pg

PostgreSQL bindings for H3, a hierarchical hexagonal geospatial indexing system
Apache License 2.0
273 stars 36 forks source link

h3_polygon_to_cells: Invalid geometries can return millions of Hex IDs #157

Open jmealo opened 1 month ago

jmealo commented 1 month ago

Issue:

I've found that calling h3_polygon_to_cells with invalid geometries can result in as many as 2.3 million hex_ids being returned at a resolution of 7. This is not a bug in h3-pg but, rather how the h3 library handles invalid geometries. It's unclear whether we can guard against this, or should at the very least provide documentation/warnings about the issues and workarounds to mitigate them.

Cause:

Test case:

{
  "type": "Polygon",
  "coordinates": [
    [
      [-148.5, 29.1],
      [-148.5, 63.9],
      [-72.5, 29.1],
      [-72.5, 63.9],
      [-148.5, 29.1]
    ]
  ]
}
WITH h3_cells AS (
  SELECT h3_polygon_to_cells(
    ST_GeomFromGeoJSON('{
      "type": "Polygon",
      "coordinates": [
        [
          [-148.5, 29.1],
          [-148.5, 63.9],
          [-72.5, 29.1],
          [-72.5, 63.9],
          [-148.5, 29.1]
        ]
      ]
    }'),
    7
  ) AS cells
)
SELECT COUNT(1) 
FROM h3_cells;
# count = 2,364,092

Solution:

Upstream issue:

https://github.com/uber/h3/issues/926

zachasme commented 1 month ago

We can definitely handle this more nicely. Principally I'm trying to keep the base h3 extension as close to upstream as possible, instead using the h3-pg extension as a friendlier PostGIS wrapper around those. So I would be open to checking validity or using ST_MakeValid when calling the h3-pg functions.

jmealo commented 1 month ago

@zachasme: The issue I ran into with my particular geometries is that ST_MakeValid() was returning huge bowties, that would still create what I assume to be millions of hexes at higher resolutions. I'm not 100% sure what the correct behavior is. I'm pushing to add ST_IsValid() constraints or triggers to our tables that store geometry/geography.

zachasme commented 1 month ago

I would be okay with adding a flag to the postgis variants of polygon_to_cells that either enables or disables running ST_IsValid on inputs. What do you think of that solution?

jmealo commented 1 week ago

@zachasme: I think that could make sense. I got pulled off this but will be revisiting it in the next few weeks.

Thoughts:

I would be OK with providing a flag, if it defaults to ON. We should verify that checking a flag at runtime isn't more expensive than calling ST_IsValid() 🤔 .