isaacbrodsky / h3-duckdb

Bindings for H3 to DuckDB
Apache License 2.0
170 stars 8 forks source link

Performing a polyfill? #78

Closed diehl closed 10 months ago

diehl commented 10 months ago

I've loaded a GeoJSON file into a DuckDB table that has two columns (own, geom) by leveraging the spatial extension.

I assume in order to perform a polyfill of polygons, I need to use the h3_polygon_wkt_to_cells.

In order to get a sense of the output, I tried executing the following SQL:

select h3_polygon_wkt_to_cells(geom,11) from nm_sma limit 1;

I'm seeing the following error message which I don't know how to interpret:

Error: Invalid Error: Invalid WKT: expected a hole loop or ')' at pos 990

Could someone point me in the right direction?

isaacbrodsky commented 10 months ago

This error happens when the text in geom is not in the expected WKT format.

Could you confirm the geom column is in WKT format, and possibly paste a row (as text) or file that exhibits the issue? If this happens for all geometries you try via GeoJSON, could you paste the exact commands you are using to load the files?

diehl commented 10 months ago

@isaacbrodsky I bet that is exactly the case, as I don't know how to convert the geometries into WKT format.

The error message I got when I accidentally left out the resolution parameter in the call seems to suggest the geom type is mismatched. See the following:

D select h3_polygon_wkt_to_cells(geom) from nm_sma limit 1;
Error: Binder Error: No function matches the given name and argument types 'h3_polygon_wkt_to_cells(GEOMETRY)'. You might need to add explicit type casts.
    Candidate functions:
    h3_polygon_wkt_to_cells(VARCHAR, INTEGER) -> UBIGINT[]

LINE 1: select h3_polygon_wkt_to_cells(geom) from nm_s...
               ^
D select h3_polygon_wkt_to_cells(geom,11) from nm_sma limit 1;
Error: Invalid Error: Invalid WKT: expected a hole loop or ')' at pos 990

The command I used to load the GeoJSON file is

create table nm_sma as select * from ST_Read(<filename>);

Here's a snapshot of the table after the data is loaded:

Screenshot 2023-12-17 at 11 52 27 AM
isaacbrodsky commented 10 months ago

Could you try calling the H3 function with select h3_polygon_wkt_to_cells(ST_AsText(geom), 11) from nm_sma limit 1; to convert to WKT first? I wonder if the issue is the library is misinterpreting an internal spatial binary format as text.

diehl commented 10 months ago

Still seeing the following:

D select h3_polygon_wkt_to_cells(ST_AsText(geom), 11) from nm_sma limit 1;
Error: Invalid Error: Invalid WKT: expected a hole loop or ')' at pos 990
isaacbrodsky commented 10 months ago

Since the positions are the same, I think it was converting your geometry to WKT before calling the H3 function. It's possible this is valid WKT that the H3 library doesn't parse correctly, but without a sample of the polygon it would be hard to say exactly what the issue is.

diehl commented 10 months ago

Here's a link to the GeoJSON that I loaded into that table: https://web.tresorit.com/l/twB16#2pwgf7Ai-h7fKAF9fDvxyw

diehl commented 10 months ago

@isaacbrodsky So it looks like it's an issue with particular polygons. I just executed the following successfully:

Screenshot 2023-12-17 at 1 06 14 PM

So it looks like the underlying question is what polygon condition(s) make this operation fail?

diehl commented 10 months ago

Another geometry that fails:

D select h3_polygon_wkt_to_cells(geom, 11) from nm_sma where own = 'Bureau of Land Management' limit 1;
Error: Invalid Error: Invalid WKT: expected a hole loop or ')' at pos 2589
diehl commented 10 months ago

Loaded this dataset into GeoPandas just now and did a check to see if all of the geometries are valid, and they all are. So I'm out of ideas at the moment about what might cause this to break.

diehl commented 10 months ago

Used h3pandas just now to successfully generate the resolution 11 cell IDs for that first polygon. So the mystery continues.

isaacbrodsky commented 10 months ago

I get inconsistent crashing -- if I load a large file with many rows, I see the crash you report. If I load the same geometry from a CSV file with only one row, I get empty output ([]). This seems like it may be a bug in h3-duckdb's interfacing with DuckDB.

diehl commented 10 months ago

Roger that. Please let me know if you need anything else.

isaacbrodsky commented 10 months ago

I investigated this some more and it turns out that in addition to whatever issue causes inconsistent results, it seems my handling of holes in polygon WKT is incorrect. I thought there was no comma between linestrings in WKT polygons, but it turns out there is. With #79 I get a run without errors on a converted (CSV) copy of your input file. Please retest with #79 and let me know if it resolves your issue.

diehl commented 10 months ago

Great to hear @isaacbrodsky - I'll circle back later this week after I kick the tires.

diehl commented 10 months ago

@isaacbrodsky Seems to be working now on my end. Thanks for running this to ground!