uber / h3-py

Python bindings for H3, a hierarchical hexagonal geospatial indexing system
https://uber.github.io/h3-py
Apache License 2.0
793 stars 129 forks source link

Polyfill failing with US pacific coast data #159

Open mgramagl opened 4 years ago

mgramagl commented 4 years ago

Hi,

I experienced the following error with h3-py code, in which I try to fill the convex hull of Los Angeles with level 9 hexagons.

geoJson = {
    'type': 'Polygon',
    'coordinates': [[
        [-118.4245, 32.7994],
        [-118.3588, 32.8159],
        [-117.7289, 34.0208],
        [-117.6468, 34.2892],
        [-117.6687, 34.8204],
        [-118.8955, 34.8204],
        [-118.9448, 34.0427],
        [-118.6052, 33.0131],
        [-118.4957, 32.8542],
        [-118.4245, 32.7994],
    ]]
}
hexagons = h3.polyfill(geoJson, 9)
print(list(hexagons))

Unfortunately, this ends up with [] an empty array.

Instead, If I try to do the same with NYC I get consistent results

geoJson = {
    'type': 'Polygon',
    'coordinates': [[
        [-74.0229, 40.6808],
        [-73.9627, 40.7355],
        [-73.9134, 40.7958],
        [-73.9244, 40.8779],
        [-73.9353, 40.8834],
        [-73.9846, 40.7958],
        [-74.0229, 40.7082],
        [-74.0229, 40.6808],
    ]]
}
hexagons = h3.polyfill(geoJson, 9)
print(list(hexagons))

Am I missing something?

ajfriend commented 3 years ago

Hmmm. Not sure what's going on here, but I can confirm I'm getting the same results. I tried on h3-py versions

and each had exactly the same results you're getting.

I'll try to take a deeper look to figure out what's going on.

Here's the geo, for reference:

Screen Shot 2020-07-26 at 12 12 50 PM
mgramagl commented 3 years ago

Digging into the code I see that there is an undocumented

polyfill_geojson

function which works. Is that one correct?

ajfriend commented 3 years ago

Ah, interesting. Let me take a look!

ajfriend commented 3 years ago

hexagons = h3.polyfill(geoJson, 6, geo_json_conformant=True) should work for your original case. (res=9 is going to give you a ton of hexagons.)

Also, note that the Manhattan example above returns hexagons, but they're the wrong ones unless you use geo_json_conformant=True.

Most of the library uses the (lat, lng) ordering, but the GeoJSON spec uses (lng, lat). That'll continue to trip up folks until we can make it consistent or more clear.

This is a part of the API that I think is still confusing, but we're stuck with for the time being due to backward compatibility. This is one of the things I'd like to clean up when we move h3 and h3-py to v4.0.

If you have any thoughts on how to make this more clear, or usability in general, I'd love to hear it! It'll definitely help as we move to v4.0.

mgramagl commented 3 years ago

I use h3 for spatial joins because it is so much faster than any other approach based on geometry/rtrees, so polyfill is fundamental. IMHO, I would just take an array of lat,lng as input, and leave the burden of flipping the values to the user, instead of a dictionary as it is now.

ajfriend commented 3 years ago

Out of curiosity, what are the spatial join tasks you're doing? I'm curious because I wonder if we can identify a "core" operation that we might try to optimize.

And how are you computing the spatial joins now?

mgramagl commented 3 years ago

It is a kind of a geo lookup table. For instance, mapping a lat,lon to state code. There the approach based on rtrees is in my experiment (I use spark) much slower than the one based on h3 polygons (i.e., I compute an h3 from the lat,lon and then perform a join operation with the ids I get from the polyfill). For now I trade complexity with precision (as the state borders are not perfectly matched by polygons) but it is fine for my use case.