libgeos / geos

Geometry Engine, Open Source
https://libgeos.org
GNU Lesser General Public License v2.1
1.21k stars 361 forks source link

Update GEOS algorithms to support XY CoordinateSequence #872

Open dbaston opened 1 year ago

dbaston commented 1 year ago

The CoordinateSequence implementation introduced in #799 allows a CoordinateSequence to be backed by any of the Coordinate types: XY, XYZ, XYM, and XYZM. However, most of the GEOS library accesses coordinates using CoordinateSequence::getAt<Coordinate>(), which requires assumes the CoordinateSequence is backed by XYZ or XYZM Coordinates. So for now, GEOS pads XY and XYM sequences with Z values, forcing them into XYZ or XYZM.

I made certain parts of the GEOS code base "2D safe" (most of geos::algorithm and geos::operation::valid) by removing XYZ access (using CoordinateSequence::getAt<CoordinateXY> instead of CoordinateSequence::getAt(). As long as we only run these algorithms, this allows removal of the GEOS_COORDSEQ_PADZ define in CoordinateSequence.cpp, so that 2D geometries geometries are backed by XY coordinates rather than XYZ coordinates. I did some performance benchmarking with GEOSisValid and not find a compelling performance benefit. However, enabling the code to function with XY-backed sequences may have benefits for memory-constrained users and would make #747 usable for the most common case of an external XYXY buffer. Running the test suite with GEOS_COORDSEQ_PADZ removed will give some idea of the extent of GEOS code that is "2D safe".

Making 2D algorithms only use 2D access as described above is a fairly easy problem. A trickier issue is the handling of algorithms such as convex hull, distance, and Delaunary triangulation that do not require Z coordinates but preserve them if they are available. When adding M support to overlay, I used a helper function to dispatch to LineIntersector methods that are templated for the correct types. This allows us to contain the use of templates to a single point in the overlay algorithm (IntersectionAdder is not templated). And as a side effect, parts of the overlay code also support XY-backed sequences (and avoid wasted Z interpolation of NaNs).

Dispatch helper:

https://github.com/libgeos/geos/blob/3cab6cb2a8c5a5243a3fa1a8775322d6e290c6a4/include/geos/geom/CoordinateSequences.h#L42-L72

Example usage:

https://github.com/libgeos/geos/blob/3cab6cb2a8c5a5243a3fa1a8775322d6e290c6a4/src/algorithm/LineIntersector.cpp#L175-L213

Other algorithms, such as convex hull, nearest points and Delaunay triangulation, might require different techniques. One way might be to perform the algorithm using CoordinateXY references and then construct a result by casting CoordinateXY* back to whatever the actual backing type was (e.g., CoordinateXYM*).

I'm not sure the effort to make these changes is worth it, but wanted to leave some breadcrumbs for anyone who wants to pursue this farther.

dr-jts commented 1 year ago

Other algorithms, such as convex hull, nearest points and Delaunay triangulation, might require different techniques.

What about representing single coordinates using a structure wide enough to hold all potential ordinates? (Probably CoordinateXYZM, unless there is a reason for something different). That will keep Z and M ordinates if they are present, and allow copying them into newly created CoordinateSequences (which will only keep the ordinates defined for the sequence).

pramsey commented 1 year ago

For XY coordinate sequences, imposing an XYZM struct on top would read out of bounds for the last coordinates X and M values. For internally managed buffers we could just always make our buffers slightly larger than necessary, but for external buffers we'd be in a pickle.

dr-jts commented 1 year ago

That's the issue with using references to Coordinate references, I guess. Is that really so much more efficient?

Or perhaps a hybrid approach, where some access is done just be reference, but if the coordinate will form part of a new geometry, it is copied fully.

dr-jts commented 1 year ago

How about using CoordinateXY references for predicate/metric operations, and a full coordinate copy for constructive operations? Constructive operations tend to do a lot of work anyway, so the cost difference may be low (and worth it if ZM-preserving is a functional goal).

tfiner commented 6 months ago

Why go through all this trouble to maintain dimensions beyond XY for what is essentially "user" data for each and every coordinate?

dr-jts commented 6 months ago

Why go through all this trouble to maintain dimensions beyond XY for what is essentially "user" data for each and every coordinate?