Closed Ipiano closed 2 years ago
I would say that the documentation has never been accurate, rather than recent changes breaking something, but please feel free to set me straight. As you point out, the getAt
methods, which are only way to access the elements of a CoordinateSequence
, return a reference to a Coordinate
. Therefore, any implementation of CoordinateSequence
must already have, or be able to produce, a reference to a Coordinate
that is usable for the life of the CoordinateSequence
.
You can't do much in GEOS without accessing Coordinates. A CoordinateSequence
implementation that does not own a copy of all Coordinates (for example, a CoordinateSequence
backed by three arrays of doubles) is going to have to create and persist a copy of each Coordinate the first time it's accessed. If you're going to have to copy each one of them on access, you might as well copy them all up front and stick them in a contiguous block of memory, no? That cost is miniscule compared to operations you might perform with the library (predicates/overlay/etc)
The flaw I see in CoordinateSequence
is that it is an abstract type, so key methods like getAt
are virtual and slow. But there's essentially no viable implementation other than CoordinateArraySequence
. So we are paying a performance penalty for flexibility that is not needed.
Time to remove the inheritance and have CoordinateSequence
just be the implementation class? (Maybe also time to think about how we really want to handle M (and Z))
Time to remove the inheritance and have CoordinateSequence just be the implementation class?
I think so. I'm toying with a single class that can store (via a union) any of (a) std::vector
of Coordinates (b) a pointer to an external buffer of Coordinates (such as a 3D POINTARRAY
) or (c) a single Coordinate
as a value. That's a fun experiment, but I suspect CoordinateArraySequence
is going to be good enough for all cases.
the documentation has never been accurate, rather than recent changes breaking something
As we've both noted, the current getAt
makes it impossible to store data without using Coordinate
as the underlying type, but I think until recently it would have been possible to back it with std::deque<Coordinate>
(in theory - I'm not sure why you'd want to). But that's now definitely not possible because deque doesn't guarantee contiguous storage and there's assumptions about pointer math validity. That doesn't answer all the questions of how this could work, but it does demonstrate that other implementations could have been possible.
A CoordinateSequence implementation that does not own a copy of all Coordinates ... is going to have to create and persist a copy of each Coordinate
The fact that persisting the coordinate is required was not immediately apparent to me. Clearly it's required now, but the way that CoordinateSequence
has been used previously did not require persisting the coordinate forever, as it would more often be copied or used immediately instead of the pointer stored. Not in all cases, but in enough that it worked for my use.
Regardless, it sounds like you're leaning toward removing the in inheritable nature of CoordinateSequence
, which is a fine solution because it clarifies the expectations of the API.
Thank you for answering the question so quickly!
Time to remove the inheritance and have
CoordinateSequence
just be the implementation class? (Maybe also time to think about how we really want to handle M (and Z))
+1. Time to open an RFC or start a Discussion?
Is there an action to be taken here? The documentation could be removed, but it's not wrong in theory, just a little blind to the limitations of a sub-classing approach in practice.
I would say the quoted section is wrong and should just be removed:
There are some cases in which you might want Geometries to store their points using something other than the GEOS Coordinate class. For example, you may want to experiment with another implementation, such as an array of Xs and an array of Ys. or you might want to use your own coordinate class, one that supports extra attributes like M-values.
Yes, you might want this, but no, GEOS provides no means for you to do so.
I understand that the C++ API should be treated as unstable, and I accept the risks of using an unstable API. This is not a complaint that the API changed. This is a bug in the documentation, which does not match the implementation.
As Geos stands today, it explicitly breaks one of the use cases outlined in the documentation as one that it caters to. I would like to understand if this is intentional or not so that I can decide whether to cut and run, fork the project, or try to fix it.
The documentation of
geos::geom::CoordinateSequence
states:Given this description, and the description of
CoordinateSequence::getAt
, which statesI believe it is reasonable to conclude that a custom
CoordinateSequence
implementation which does not store the underlying data asCoordinate
objects should be allowed to construct temporaryCoordinates
any time it needs to interface with the rest of libGeos. Already, this is problematic because thegetAt
method is defined to return a reference, meaning at the very least the customCoordinateSequence
would have to hold a persistentCoordinate
as a member variable that it could populate and return the reference to.This issue becomes more obvious working with the filter functions
CoordinateSequence::apply_ro
andCoordinateSequence::apply_rw
. TheCoordinateFilter
interface takes pointers toCoordinate
objects, and in some cases these filters make assumptions about the underlying structure of the data or persistence of the pointers. The two specific issues I have come across while attempting to update libGeos are the implementations ofgeos::operation::valid::RepeatedPointFilter
andgeos::algorithm::locate::IndexedPointInAreaLocator::SegmentView
. In the former case, there is an assumption made that the pointers passed in continue to be valid at least until the next pointer is passed in; in the latter case, there's an assumption that the two pointers passed in are adjacent to each other such that the first can be incremented to reach the second.I see that a lot of the recent changes internally have been performance-oriented, and I laud that effort; however, they've come at the expense of an immensely useful piece of the API that has the potential to provide a bigger benefit than reducing the number of
virtual
calls in a hot loop. Allowing users to define their ownCoordinateSequence
type which integrates with the rest of geos and does not require usingCoordinate
as an underlying storage type allows users to utilize the power of this library without having to be constantly copying the data from their custom data storage format into and out of theCoordinate
type.I would like to know if this is an intentional break and using
CoordinateSequence
as I've described isn't actually supported. If so, the documentation should be changed to indicate that this use case is no longer supported. If not, then I believe the API ofCoordinateSequence
should be modified such that it doesn't return references or pass pointers to filters, and that filters should make no assumptions about the layout or persistence of data and should prefer to copy coordinates instead.