libgeos / geos

Geometry Engine, Open Source
https://libgeos.org
GNU Lesser General Public License v2.1
1.09k stars 338 forks source link

GEOS policy for Z ordinate value population? #1087

Open vadimcn opened 2 weeks ago

vadimcn commented 2 weeks ago

I've observed that in many GEOS operations, such as clipping and interpolation, new polygon vertices are created with a Z coordinate set to NaN. This complicates handling elevation data, as it necessitates manual interpolation of missing values, leading to bugs and sometimes making correct interpolation impossible (e.g. for endpoints of LineStrings resulting from clipping).

It appears that functions like LineSegment::pointAlong could be modified to also interpolate the Z coordinate.

I would like to understand the project's perspective on this issue. Is improved Z handling something GEOS is interested in, or is the omission of Z values a deliberate choice?

A similar question could be raised regarding 'M' values, although the appropriate method for interpolating them is less clear.

pramsey commented 2 weeks ago

I am surprised you are seeing missing Z's in clipping, as the constructive geometry overlay engine does try to in-fill the new points with interpolated Z. Which gives you a hint as to our policy, which is that we're "OK" with it, and for places that currently do not do it, would be fine with adding it.

vadimcn commented 2 weeks ago

I am surprised you are seeing missing Z's in clipping

I was using shapely's clip_by_rect on a LineString (which maps to GEOSClipByRect, as I understand). This does work when using the general intersection function.

Which gives you a hint as to our policy [...]

I think there should be explicit guidelines on how Z coordinates are handled, to avoid confusion among project users and contributors. Although the expected result is clear in the example mentioned, there are other scenarios where it's less obvious. For example:

And so on...

tfiner commented 2 weeks ago

I just got bit by this define deep down in CoordinateSequence: GEOS_COORDSEQ_PADZ which erroneously makes space for Z's, but then crucially, does not set hasZ!
Yet, when you ask a CoordinateSequence for its CoordinateType, it returns XYZ. Sigh. If I bother to construct a CoordinateSequence, and I pass in hasZ=false, then I expect the CoordinateSequence to be: XY. Likewise when I pass in a 2 for dim, it ignores that and makes it an XYZ, yet because it doesn't set hasZ, it also lies and returns getDimension() == 3.

It is never a good design to lie in an API, be explicit, and if you want a default with dimension==3, then make that explicit, but give users the ability to make 2 dimensional CoordinateSequences...

dr-jts commented 2 weeks ago

I just got bit by this define deep down in CoordinateSequence: GEOS_COORDSEQ_PADZ which erroneously makes space for Z's, but then crucially, does not set hasZ!

@tfiner This should be filed as an issue.

tfiner commented 2 weeks ago

Done, see #1090

dbaston commented 2 weeks ago

I just got bit by this define deep down in CoordinateSequence: GEOS_COORDSEQ_PADZ which erroneously makes space for Z's, but then crucially, does not set hasZ!

This is by design - see the comment where GEOS_COORDSEQ_PADZ is defined.

hasZ() reports whether the sequence stores XYZ coordinates, which is different from whether it is capable of storing XYZ coordinates.

tfiner commented 2 weeks ago

I read the comment (buried in the cpp file mind you), I am saying that it was not a good design choice. Offering dimension parameters that are ignored is surprising to say the least, even if it helped someone trying to load 3D points. They should have to specify that, or, you could use the type system so that it would be impossible to screw up.

dbaston commented 2 weeks ago

It doesn't have anything to do with helping someone trying to load 3D points. It has to do with incrementally fitting a variable-dimension capability into a 22-year old library that assumed for 20 years that all coordinates are XYZ. I think this project is pretty receptive to improvements, so if you have a proposal to just use the type system I'd encourage you to share it.

tfiner commented 2 weeks ago

I don't know enough about GEOS, but does it make sense to make templated collections like good old STL? Another option might be to extend the type system to the collections, you already have CoordinateXY and CoordinateXYZ, etc. maybe CoordinateSequence2d vs. CoordinateSequence3d, etc.

BTW, that comment for the define says: "This prevents incorrect results when an XYZ Coordinate is read from a sequence storing XY or XYM.".. You can see why I thought that this was to help someone load XYZ.

That comment also implied that the define was temporary, not something to cope with a 22 year old library. I am new here, and I mentioned this because I needed to project a bunch of points in some 2d polygons, and could not bring myself to project them, one at a time. So I started looking at the pointers to the Coordinates and was completely surprised that although I had given them 2d Coordinates, I had a whole bunch of Nans in between my XY's.

hobu commented 2 weeks ago

I don't know enough about GEOS, but does it make sense to make templated collections like good old STL? Another option might be to extend the type system to the collections, you already have CoordinateXY and CoordinateXYZ, etc. maybe CoordinateSequence2d vs. CoordinateSequence3d, etc.

The place to type yourself to death is Boost::Geometry, but I don't think you'll find life so much better than GEOS as far as dealing with coordinate attributes other than XY. You'll be frustrated with type casting costs, wonder why it makes you differentiate coordinate systems in the type system, and lament having to write your own implementations of basic utilities to i/o common serializations into the library.

If you want a library for high dimensionality points, GEOS isn't what you want. PDAL (GEOS-style memory model) or PCL (types! types! types!) might be what you want, but you haven't stated exactly what you're trying to do.

tfiner commented 2 weeks ago

Thank you Howard, I am a huge fan of PDAL. Yeah, I tried boost geom awhile back, whew, painful.

It sounds like GEOS has a lot of old code that assumes multidimensional points, up to 4. My point was that the coordinates have types, but that the containers don't. My point about types is that if you are going to play with raw pointers, you'd better tell the truth about them, always.

I was mostly griping that the interface for the CoordinateSequence isn't consistent. What I was trying to do was project points using proj. I wanted a way that, given a CS, I could efficiently transform a few million points in one call. The fastest way to do that is to give proj x,y pointers and their strides. You can't get the stride(), because it is private. You can't ask for hasZ(), because it lies (says 2, when it allocated for 3), so you have to ask for the type, which is XYZ and infer the stride from that. The documentation and the header aren't good enough to tell you that, so you must spelunk the code to find a define to find out why.

All this made me think, there has to be a better way of using types in the interface to prevent whatever invariant is desperately trying to be preserved here to the point that explicit dimension parameters are ignored. If you have to have 3 dimensions, don't pretend to support 2.

vadimcn commented 2 weeks ago

@tfiner @hobu @dbaston I feel that you are discussing a somewhat different issue than the one I've raised originally. Let's not conflate them by mixing in same thread.
May I ask you to please move your comments into the newly created issue #1090 and delete them here? Thanks!

vadimcn commented 2 weeks ago

@pramsey I'd love to contribute a fix to GEOS, but I'd like a hint on how to handle the ambiguous cases such as mentioned here. Even if it's just "It's unspecified behavior, do whichever is easier".

dr-jts commented 2 weeks ago

I'd like a hint on how to handle the ambiguous cases such as mentioned here. Even if it's just "It's unspecified behavior, do whichever is easier".

Agreed, there should be a wider policy for Z (and M) handling in GEOS. Best place for this is probably somewhere on https://libgeos.org/ (perhaps as an RFC initially, but ideally under a Specification heading. ). This could start as an RFC issue, which allows commenting (perhaps can just use this issue?)

dr-jts commented 2 weeks ago

There appears to be two parts to your question:

  1. why do some (most?!) operations not do something sensible with Z
  2. what are the policies around computing Z (and M) values for new coordinates?

For (1), the simple reason is that the effort to define and implement Z handling has not been made, due to lack of (prior) demand and resources. PRs will be welcome for this (which will be assisted by some work to address 2)

dr-jts commented 2 weeks ago

Here's some thoughts for general policies on Z handling:

More suggestions welcome.

dr-jts commented 2 weeks ago

See https://github.com/locationtech/jts/issues/626 for a specification of the Z handling added for OverlayNG in JTS (which AFAIK applies to GEOS as well).

This should be summarized in the doc for the OverlayNG class (in both projects). And possibly some of the material can be provided in a page on https://libgeos.org/