opengeospatial / geoparquet

Specification for storing geospatial vector data (point, line, polygon) in Parquet
https://geoparquet.org
Apache License 2.0
825 stars 56 forks source link

Antimeridian Crossings and bbox #198

Open jwass opened 7 months ago

jwass commented 7 months ago

As mentioned in #191 in https://github.com/opengeospatial/geoparquet/pull/191#discussion_r1488344906, geometries that cross the antimeridian will break the current row group filtering mechanics.

e.g if you have a geometry [-10,-10,10,10] and a [170,-10,-170,10] (crossing A-M) in a single row group, then the row group stats are going to be [-10,-10,10,10] and thus the geometry crossing A-M will not be selected.

Our solution is just say antimeridian crossings are unsupported but we can use this thread to discuss solutions.

rouault commented 7 months ago

A potential solution would be to require that for a AM crossing geometry, the bbox.xmin is in the range [0,180], and add +360 to the bbox.xmax so that it is in the range [180,360]. (the x of the geometries themselves could/should still be in the [-180,180] range) So a geometry with bbox [170,-10,-170,10] would be actually indexed as [170,-10,190,10] in the bounding box columns

jorisvandenbossche commented 7 months ago

@kylebarron is that the same as what you suggested in the meeting two weeks ago (adding 360 to the xmax value)?

While that fixes the issue with the "wrong" rowgroup-level xmax value, how does this work for filtering based on the bbox column itself? Assume you are reading with a bbox filter of [-180, -10, -170, 10]. Using simple x/y min/max comparisons for the bbox, such a filter would also not select the row with the A-M crossing, because that bbox now doesn't overlap with [170,-10,190,10]

rouault commented 7 months ago

Assume you are reading with a bbox filter of [-180, -10, -170, 10]. Using simple x/y min/max comparisons for the bbox, such a filter would also not select the row with the A-M crossing, because that bbox now doesn't overlap with [170,-10,190,10]

True, and I'm pretty sure there is no simple/elegant solution as soon as discontinuities are involved :-) So if you spatial filter near the anti-meridian, you need to issue your spatial filter twice: one with longitudes near -180 and another time with them shifted of +360.

jorisvandenbossche commented 7 months ago

The other option would be to step away from GeoJSON's wording using the "most southwesterly point (min) and most northeasterly point (max)" ? And use the actual min/max of the coordinate values instead. For the example we are using, that would give a bbox of something like [-180, -10, 180, 10]. Of course, that spans a big part of the globe, and makes the bbox very not-specific (making this is selected too often). But at least that means that a naive bbox filter would not miss features, and if you want to avoid this too broad bbox values, as a data producer you can still avoid that by not having geometries crossing the A-M in a single row or row group.

cholmes commented 5 months ago

Do we want to try to do anything here for 1.1? The spec now says:

Note: This technique to use the bounding box to improve spatial queries does not apply to geometries that cross the antimeridian. Such geometries are unsupported by this method.

That seems sufficient to me for now, so I'm going to move the milestone for this out, unless someone wants to tackle something for 1.1

cholmes commented 5 months ago

Discussed on call 6/3/24 - @jorisvandenbossche proposal above seems better since the filter would always be 'correct', so he's going to check what GDAL does / make a PR to discuss. Seems like it'd be clearer in the spec to do this.

paleolimbot commented 5 months ago

I like the idea of using strictly a min/max of coordinates (which implies that edges are not permitted to cross the antimeridian for lon/lat coordinates with planar edges)!

jorisvandenbossche commented 5 months ago

I think I was confusing the case of a split vs non-split geometry (i.e. a geometry that is represented by a Multi geometry cut into two pieces by the antimeridian, vs an actual single geometry where the vertices cross the antimeridian). The suggestion to simply use actual min/max values instead of southwesterly/northeasterly points of course only works in case of split geometries, because otherwise you still get "wrong" stats.

For GeoPandas' use cases, where geopandas does not support geographical coordinates properly (just regard them as cartesian, so for geopandas a line is never crossing the antimeridian, and thus in practice you can only work with split geometries in geopandas), saying to use simple min/max values is fine (and matches with what we currently do).

But for systems that do support proper geographical coordinates (with edges=spherical) with lines crossing the antimeridian, we ideally still need a solution (other than saying this is not supported)? (or follow GeoJSON rfc7946 spec to say that such geometries SHOULD be cut?)

paleolimbot commented 5 months ago

But for systems that do support proper geographical coordinates (with edges=spherical) with lines crossing the antimeridian, we ideally still need a solution

I don't think those systems will ever export a bounding box column (because it is computationally expensive to do so in addition to making no sense). There is also no way to compute this (s2's latlngbounder will only get you a GeoJSON-style southwest/northeast bbox). An s2 or h3 covering would be the method for pushing down a "may intersect" query (and there's no need to consider that until we have a system that can actually read/write those...I'm working slowly on doing this in R!)