uber / h3-py

Python bindings for H3, a hierarchical hexagonal geospatial indexing system
https://uber.github.io/h3-py
Apache License 2.0
833 stars 132 forks source link

Q: Invalidity of Polygons in GeoJSON, GeoPandas #187

Open tony-zeidan opened 3 years ago

tony-zeidan commented 3 years ago

Hi there,

I am going to make this as clear as I can. I am currently working on a library to plot hexagonally binned data in a geospatial context. So far I have used Plotly graph objects and I've been using H3 for the past 2 months with great results. I had an earlier problem with the way that my GeoJSON Polygons were defined through the 180th meridian (a common issue as far as I can tell). The recommended way to fix the problem was to split these Polygons into two separate Polygons stored in a MultiPolygon however, I found doing this programmatically was a big challenge, so I opted for another method where Polygons that cross have their coordinates shifted properly, and it has worked for every case thus far. The new issue comes where I try to use that same algorithm over Polygons containing either geographic pole and perhaps other Polygons. I have ensured the winding order of my Polygons comply with RCF7946 (RFC 7946 - The GeoJSON Format). Some of these Polygons still appear to be invalid though. I believe the problem either has to do with the invalidity of the Polygon itself or the GeoJSON form of the Polygon.

Again: I believe the issue has to do with Polygons that cross the antimeridian. I believe that GeoPandas determines these polygons are invalid. I am aware that the winding order of the Polygon must be in compliance with RC7946 for them to be plotted properly (https://tools.ietf.org/html/rfc7946) and I ensure this in my code. I have heard that for GeoJSON the recommended action for Polygons that cross the antimeridian is to split them into two Polygons and then stitch them together with a MultiPolygon. This can not be done very easily for Polygons that contain the North Pole so I opted out of this approach. I implemented another approach that translates incorrect coordinates in order to make the Polygon not technically span around the globe and instead over the meridian. This approach can be seen below:

def conform_polygon(poly: Polygon, d3_geo: bool = True) -> Polygon:
    """Conforms the given polygon to the given standard.

    :param poly: The polygon to conform
    :type poly: Polygon
    :param d3_geo: The conform standard (True->Clockwise,False->Counterclockwise)
    :type d3_geo: bool
    :return: The conformed Polygon
    :rtype: Polygon
    """
    poly = _fix_polygon_bounds(poly)
    return polygon.orient(poly, sign=-1) if d3_geo else polygon.orient(poly, sign=1)

def _wraps_lat(poly: Polygon):
    coordinates = list(poly.exterior.coords)
    for x in range(1, len(coordinates)):
        p0 = coordinates[x - 1]
        p1 = coordinates[x]
        if _check_antimeridian(p0[0], p1[0]):
            return True
    return False

def _fix_polygon_bounds(poly: Polygon):
    wraps = _wraps_lat(poly)
    coordinates = list(poly.exterior.coords)
    new_poly = []
    for cord in coordinates:
        lon = _fix_lon(cord[0]) if wraps else cord[0]
        lat = _fix_lat(cord[1])
        new_poly.append((lon, lat))
    return Polygon(new_poly)

This

In one part of my code I wanted to check if a Hexagon contained the North Pole (for testing):

Before and after I conform the Polygons to the GeoJSON standard, I tired printing their validity. I ensure that the GeoDataFrame is in crs='EPSG:4326:

    print('BEFORE')
    for i, row in hgdf.iterrows():
        if not row['validity']:
            print('NOT VALID:', i, row.geometry)
            print('CROSSES:', _wraps_lat(row.geometry))
            hgdf.at[i, 'geometry'] = conform_polygon(row.geometry)

    print('\n\nAFTER')
    hgdf['validity'] = hgdf.is_valid
    for i, row in hgdf.iterrows():
        if not row['validity']:
            print('NOT VALID:', i, row.geometry)
            print('CROSSES:', _wraps_lat(row.geometry))

    return hgdf

Outputs:

BEFORE
NOT VALID: 8003fffffffffff POLYGON ((145.5581976913369 87.36469532319619, -163.4568680790095 76.14556732608257, -131.7088390879296 69.37134141076518, -100.821870205822 67.53592431503803, -66.90449925088507 72.20470505499345, -34.75841798028463 81.27137179020501, 145.5581976913369 87.36469532319619))
CROSSES: True
NOT VALID: 8005fffffffffff POLYGON ((94.14309010184775 76.16304283019099, 117.8129253882809 66.20328559643201, 145.3361375450608 62.49215566258707, 172.8613713986921 66.19292316051032, -163.4568680790095 76.14556732608257, 145.5581976913369 87.36469532319619, 94.14309010184775 76.16304283019099))
CROSSES: True
NOT VALID: 800dfffffffffff POLYGON ((-163.4568680790095 76.14556732608257, 172.8613713986921 66.19292316051032, -175.7482187684136 56.19652224584304, -157.2827709187156 54.01377148101901, -139.6835934816099 59.16948256665965, -131.7088390879298 69.37134141076518, -163.4568680790095 76.14556732608257))
CROSSES: True
NOT VALID: 8017fffffffffff POLYGON ((145.3361375450608 62.49215566258707, 145.3285160689358 51.12825449972591, 158.3828964095353 43.91230401757139, 175.6209968167824 46.18001465709013, -175.7482187684136 56.19652224584304, 172.8613713986921 66.19292316051032, 145.3361375450608 62.49215566258707))
CROSSES: True
NOT VALID: 8023fffffffffff POLYGON ((-161.6470780513571 34.99400669710761, -153.0373918235325 43.44325276436556, -157.2827709187156 54.01377148101901, -175.7482187684135 56.19652224584303, 175.6209968167824 46.18001465709013, -175.5643292250485 35.74120697117649, -161.6470780513571 34.99400669710761))
CROSSES: True
NOT VALID: 8033fffffffffff POLYGON ((177.8469437115915 24.75690942135446, -175.5643292250485 35.74120697117649, 175.6209968167824 46.18001465709013, 158.3828964095353 43.91230401757139, 154.7440406879096 31.88396300415957, 164.4572725136773 23.00472505886228, 177.8469437115915 24.75690942135446))
CROSSES: True
NOT VALID: 8047fffffffffff POLYGON ((-175.5643292250485 35.74120697117649, 177.8469437115915 24.75690942135446, -175.4582711140254 15.28573031248395, -163.7145596599895 14.62307489222751, -156.0899885884176 24.69671125803906, -161.6470780513571 34.9940066971076, -175.5643292250485 35.74120697117649))
CROSSES: True
NOT VALID: 805bfffffffffff POLYGON ((179.2171608248945 5.889921754313913, -175.4582711140254 15.28573031248395, 177.8469437115915 24.75690942135446, 164.4572725136773 23.00472505886228, 160.7733897026288 12.19472672616799, 168.3352524578736 4.467031609784526, 179.2171608248945 5.889921754313913))
CROSSES: True
NOT VALID: 8071fffffffffff POLYGON ((-175.4582711140254 15.28573031248395, 179.2171608248945 5.889921754313892, -176.0569638442135 -3.968796976609585, -165.4167499285883 -5.762860491436919, -158.5996749181401 3.33573485751735, -163.7145596599895 14.62307489222751, -175.4582711140254 15.28573031248395))
CROSSES: True
NOT VALID: 80edfffffffffff POLYGON ((169.5550224552217 -63.09505407752546, 150.1176643555059 -58.03211375817637, 128.4499170789391 -61.33081918088026, 113.0955007491149 -72.20470505499345, 145.2415820197154 -81.27137179020501, -179.6743896480568 -73.31022368544396, 169.5550224552217 -63.09505407752546))
CROSSES: True
NOT VALID: 80f3fffffffffff POLYGON ((-179.6743896480567 -73.31022368544396, 145.2415820197154 -81.27137179020501, -34.44180230866333 -87.36469532319619, -85.85690989815225 -76.16304283019099, -117.6546550434902 -69.39359648991829, -148.1687195009126 -68.92995788193983, -179.6743896480567 -73.31022368544396))
CROSSES: True

AFTER
NOT VALID: 8001fffffffffff POLYGON ((31.8312804990874 68.92995788193983, 62.34534495650978 69.39359648991829, 94.14309010184775 76.16304283019099, 145.5581976913369 87.36469532319619, 325.2415820197153 81.27137179020501, 0.3256103519432604 73.31022368544396, 31.8312804990874 68.92995788193983))
CROSSES: True

It works for almost all of the Polygons, although it seems that when they are in plotted it is not only 8001fffffffffff that is causing the problem.

Here is the GeoJSON object of 8001fffffffffff. Note that the GeoJSON passed into Plotly's choropleth functions contains this GeoJSON object along with all of the others. This is the GeoJSON object AFTER being conformed with my algorithm.

{"geometry": {"coordinates": [[[31.83128, 68.929958], [62.345345, 69.393596], [94.14309, 76.163043], [145.558198, 87.364695], [325.241582, 81.271372], [0.32561, 73.310224], [31.83128, 68.929958]]], "type": "Polygon"}, "id": "8001fffffffffff", "properties": {"value": 1.4913616938342726}, "type": "Feature"}

The invalidity of H3's hexagons in GeoPandas has probably already been brought up, but if anyone knows how to fix the invalidity of 8001fffffffffff (hexagon around the north pole at 0 hex res) please let me know. If any other advice can be given please let me know.

tony-zeidan commented 3 years ago

If anyone is wondering what this looks like in Plotly... image

This is 8001fffffffffff: image

But when the same figure is rotated...

image

8001fffffffffff after being rotated: image

It seems that the hexagon 8003fffffffffff is also problematic although it passes the validity test. The GeoJSON for it right before plotting:

{"geometry": {"coordinates": [[[145.558198, 87.364695], [325.241582, 81.271372], [293.095501, 72.204705], [259.17813, 67.535924], [228.291161, 69.371341], [196.543132, 76.145567], [145.558198, 87.364695]]], "type": "Polygon"}, "id": "8003fffffffffff", "properties": {"value": 2.60422605308447}, "type": "Feature"}

and for some reason on the plot it looks like this when rotated: image

nrabinowitz commented 3 years ago

I'm a little confused here. Is your main issue:

The first issue is a (mostly) known limitation of the current polyfill approach, which may misinterpret polygons with arcs > 180 degrees and/or covering the poles. This is something we would like to address in the next major version of H3 - the tradeoff is that we will need to require proper winding order for the input, which is a breaking change.

The second issue is not one I'm aware of, and if that's a problem it would be great to understand how the H3 output may be invalid.

tony-zeidan commented 3 years ago

It is definitely some combination of both issues. It's good to know that polyfill has its limitations with covering poles and such. I don't think it is that the polygons are being misinterpreted but some are returned invalid, as I am receiving polygons but they are acting weird when plotted and geopandas detects them as invalid even though they are in the right crs. I can't wait for polyfill to be fixed. For the second issue, it seems that all the polygons that are deemed invalid are ones that cross the antimeridian (although I am not 100% sure). I have looked at documentation and come to a personal conclusion that these geometries are deemed invalid as they have self-intersection.

tony-zeidan commented 3 years ago

I'd like to add that in the first issue you mentioned, it was assumed that I was using polyfill here. What I'm actually using is a combination of geo_to_h3 (to get lat long coordinates into cell ids) and h3_to_geo_boundary(geo_json=True) to convert the cell ids into shapely Polygons which are then stored in a GeoDataFrame geopandas object.

Any additional advice would be greatly appreciated.

tony-zeidan commented 3 years ago

I have performed a test similar to the one I posted earlier. I made it print all of the hexagons that will ultimately be shown in my plot (only if they are invalid or cross the antimeridian):

    for i, row in hgdf.iterrows():
        if not row['validity']:
            print('NOT VALID:', i, row.geometry)

        if _wraps_lat(row.geometry):
            print('CROSSES:', i, True)
CROSSES: 8001fffffffffff True
NOT VALID: 8003fffffffffff POLYGON ((145.5581976913369 87.36469532319619, -163.4568680790095 76.14556732608257, -131.7088390879296 69.37134141076518, -100.821870205822 67.53592431503803, -66.90449925088507 72.20470505499345, -34.75841798028463 81.27137179020501, 145.5581976913369 87.36469532319619))
CROSSES: 8003fffffffffff True
NOT VALID: 8005fffffffffff POLYGON ((94.14309010184775 76.16304283019099, 117.8129253882809 66.20328559643201, 145.3361375450608 62.49215566258707, 172.8613713986921 66.19292316051032, -163.4568680790095 76.14556732608257, 145.5581976913369 87.36469532319619, 94.14309010184775 76.16304283019099))
CROSSES: 8005fffffffffff True
NOT VALID: 800dfffffffffff POLYGON ((-163.4568680790095 76.14556732608257, 172.8613713986921 66.19292316051032, -175.7482187684136 56.19652224584304, -157.2827709187156 54.01377148101901, -139.6835934816099 59.16948256665965, -131.7088390879298 69.37134141076518, -163.4568680790095 76.14556732608257))
CROSSES: 800dfffffffffff True
NOT VALID: 8017fffffffffff POLYGON ((145.3361375450608 62.49215566258707, 145.3285160689358 51.12825449972591, 158.3828964095353 43.91230401757139, 175.6209968167824 46.18001465709013, -175.7482187684136 56.19652224584304, 172.8613713986921 66.19292316051032, 145.3361375450608 62.49215566258707))
CROSSES: 8017fffffffffff True
NOT VALID: 8023fffffffffff POLYGON ((-161.6470780513571 34.99400669710761, -153.0373918235325 43.44325276436556, -157.2827709187156 54.01377148101901, -175.7482187684135 56.19652224584303, 175.6209968167824 46.18001465709013, -175.5643292250485 35.74120697117649, -161.6470780513571 34.99400669710761))
CROSSES: 8023fffffffffff True
NOT VALID: 8033fffffffffff POLYGON ((177.8469437115915 24.75690942135446, -175.5643292250485 35.74120697117649, 175.6209968167824 46.18001465709013, 158.3828964095353 43.91230401757139, 154.7440406879096 31.88396300415957, 164.4572725136773 23.00472505886228, 177.8469437115915 24.75690942135446))
CROSSES: 8033fffffffffff True
NOT VALID: 8047fffffffffff POLYGON ((-175.5643292250485 35.74120697117649, 177.8469437115915 24.75690942135446, -175.4582711140254 15.28573031248395, -163.7145596599895 14.62307489222751, -156.0899885884176 24.69671125803906, -161.6470780513571 34.9940066971076, -175.5643292250485 35.74120697117649))
CROSSES: 8047fffffffffff True
NOT VALID: 805bfffffffffff POLYGON ((179.2171608248945 5.889921754313913, -175.4582711140254 15.28573031248395, 177.8469437115915 24.75690942135446, 164.4572725136773 23.00472505886228, 160.7733897026288 12.19472672616799, 168.3352524578736 4.467031609784526, 179.2171608248945 5.889921754313913))
CROSSES: 805bfffffffffff True
NOT VALID: 8071fffffffffff POLYGON ((-175.4582711140254 15.28573031248395, 179.2171608248945 5.889921754313892, -176.0569638442135 -3.968796976609585, -165.4167499285883 -5.762860491436919, -158.5996749181401 3.33573485751735, -163.7145596599895 14.62307489222751, -175.4582711140254 15.28573031248395))
CROSSES: 8071fffffffffff True
CROSSES: 807ffffffffffff True
CROSSES: 809bfffffffffff True
CROSSES: 80bbfffffffffff True
NOT VALID: 80edfffffffffff POLYGON ((169.5550224552217 -63.09505407752546, 150.1176643555059 -58.03211375817637, 128.4499170789391 -61.33081918088026, 113.0955007491149 -72.20470505499345, 145.2415820197154 -81.27137179020501, -179.6743896480568 -73.31022368544396, 169.5550224552217 -63.09505407752546))
CROSSES: 80edfffffffffff True
NOT VALID: 80f3fffffffffff POLYGON ((-179.6743896480567 -73.31022368544396, 145.2415820197154 -81.27137179020501, -34.44180230866333 -87.36469532319619, -85.85690989815225 -76.16304283019099, -117.6546550434902 -69.39359648991829, -148.1687195009126 -68.92995788193983, -179.6743896480567 -73.31022368544396))
CROSSES: 80f3fffffffffff True
nrabinowitz commented 3 years ago

OK, so to be clear: The issue here is that GeoPandas thinks some out from h3_to_geo_boundary is invalid. Is that correct?

None of the polygons that are printed out here are obviously invalid. See for example: https://observablehq.com/@nrabinowitz/h3-index-inspector#8071fffffffffff

This seems like a bug in GeoPandas incorrectly validating polygons that cross the antimeridian. As far as I can see, there is no bug with the H3 output.

kylebarron commented 3 years ago

I haven't followed the entire comment thread as closely as possible, but I'd suggest making a minimal example with a lower level library (i.e. Shapely) than GeoPandas. GeoPandas is high level and wraps several other libraries. Shapely is lower level and provides direct bindings to the low-level GEOS C library.

Instead you should create Shapely polygons directly from whatever h3 function you're using. Then check the geometry.is_valid attribute. If that's False, then that indicates polygons or linear rings created by H3 don't meet GEOS's valid geometry criteria:

A valid LinearRing may not cross itself or touch itself at a single point. A valid Polygon may not possess any overlapping exterior or interior rings. A valid MultiPolygon may not collect any overlapping polygons. Operations on invalid features may fail.

nrabinowitz commented 3 years ago

I assume that the issue here is that the antimeridian-crossing point is assumed to go the wrong direction across the globe:

image

This would yield a polygon that intersected itself. To avoid this, we could convert the trans-meridian points to lat/lng outside the range [-180, 180], but that's likely invalid in other systems. This is a tricky issue, as there's no definitive way to determine which direction was intended by going from longitude -179 to longitude 179, but I don't think the H3 output in this case is invalid.

Related issues:

tony-zeidan commented 3 years ago

I haven't followed the entire comment thread as closely as possible, but I'd suggest making a minimal example with a lower level library (i.e. Shapely) than GeoPandas. GeoPandas is high level and wraps several other libraries. Shapely is lower level and provides direct bindings to the low-level GEOS C library.

Instead you should create Shapely polygons directly from whatever h3 function you're using. Then check the geometry.is_valid attribute. If that's False, then that indicates polygons or linear rings created by H3 don't meet GEOS's valid geometry criteria:

A valid LinearRing may not cross itself or touch itself at a single point. A valid Polygon may not possess any overlapping exterior or interior rings. A valid MultiPolygon may not collect any overlapping polygons. Operations on invalid features may fail.

I have tried as such and they are still invalid. Shapely does not work with geospatial data that well though. I am fairly certain that the LinearRings have self-intersection.

tony-zeidan commented 3 years ago

I assume that the issue here is that the antimeridian-crossing point is assumed to go the wrong direction across the globe:

image

This would yield a polygon that intersected itself. To avoid this, we could convert the trans-meridian points to lat/lng outside the range [-180, 180], but that's likely invalid in other systems. This is a tricky issue, as there's no definitive way to determine which direction was intended by going from longitude -179 to longitude 179, but I don't think the H3 output in this case is invalid.

Related issues:

I have implemented code to do this already and that is what I have been using with high success rates. It doesn't always work and I am currently trying to figure out why.

This is indeed one of the main issues that I was discussing though. The geometries are interpreted as wrapping around the globe in most software and in GeoJSON format.

tony-zeidan commented 3 years ago

In reality, according to the GeoJSON standard, geometries that cross the antimeridian should be cut in two and returned as a MultiPolygon. It does not seem like there is a general solution to this.

nrabinowitz commented 3 years ago

In reality, according to the GeoJSON standard, geometries that cross the antimeridian should be cut in two and returned as a MultiPolygon. It does not seem like there is a general solution to this.

That looks correct: https://datatracker.ietf.org/doc/html/rfc7946#section-3.1.9 - though note that this is a SHOULD (recommendation) and not a MUST (requirement). I can guarantee that if we started cutting into multipolygons other users would be up in arms (and it would render incorrectly in many cases, with a stroke on the cut).

I'd suggest this could be addressed at the binding (e.g. h3-py or h3-js) level with an optional argument to h3_to_geo_boundary, applying only if geojson=True. I think the current core H3 lib behavior is correct, however, or as correct as other options.