opengeospatial / geoparquet

Specification for storing geospatial vector data (point, line, polygon) in Parquet
https://geoparquet.org
Apache License 2.0
833 stars 57 forks source link

Add version compatibility documentation #229

Closed TheNeuralBit closed 5 months ago

TheNeuralBit commented 5 months ago

This PR proposes some version compatibility documentation, as discussed in #228. I'm just aiming to get the conversation started here. I welcome any bike shedding or nits. Or if someone more closely aligned with the project wants to take this over I'm happy to support that as well.

cholmes commented 5 months ago

Thanks for the contribution @TheNeuralBit - it looks good! I don't have any particular nits on this, and it'd be a good thing to have in.

kylebarron commented 5 months ago

I removed a trailing space on one line to fix the lint CI check

jorisvandenbossche commented 5 months ago

The examples clarifying the policy look great to me. On the exact terminology of "backwards compatible", I am not entirely sure it is correct or whether it should be "forwards compatible" in context of the format spec.

For example https://arrow.apache.org/docs/dev/format/Versioning.html only uses "backwards compatibility" for library versions, while it used "forward compatibility" for minor versions of the format:

An increase in the minor version of the format version, such as 1.0.0 to 1.1.0, indicates that 1.1.0 contains new features not available in 1.0.0. So long as these features are not used (such as a new data type), forward compatibility is preserved.

Now, as @paleolimbot mentioned in the meeting, maybe the current wording here is fine, as you can say that a backwards compatible addition to the format spec ensures to preserve forward compatibility.

jorisvandenbossche commented 5 months ago

As additional context, there is a WIP update for the Parquet format itself to better describe this (https://github.com/apache/parquet-format/pull/258). The current version of the PR defines:

  1. Backwards compatible. A file written under an older version of the format should be readable under a newer version of the format.

  2. Forwards compatible. A file written under a newer version of the format with the feature enabled can be read under an older version of the format, but some information might be missing or performance might be suboptimal.

  3. Forwards incompatible. A file written under a newer version of the format with the feature enabled cannot be read under an older version of the format (e.g. adding and using a new compression algorithm).

This defines "forward compatibility" a bit different, as we explicitly say that it is fine to add new features as long as old readers can detect that the new feature is being used.

TheNeuralBit commented 5 months ago

Thanks for the comments. I think you're right we should talk about forward compatibility here. It's not necessarily true that a backward compatible change is forward compatible. It would be backwards compatible to add a new field that, when specified, changes the interpretation of some other field, but that's not forward compatible (an older reader would get incorrect results by ignoring the new field).

cholmes commented 5 months ago

Looks great, merging in.