opengeospatial / geoparquet

Specification for storing geospatial vector data (point, line, polygon) in Parquet
https://geoparquet.org
Apache License 2.0
837 stars 57 forks source link

Remove requirement that geometry columns must not be a group field #234

Closed tschaub closed 5 months ago

tschaub commented 5 months ago

The new native geometry encodings from #189 are incompatible with the requirement that geometry columns must not be group fields. This branch suggests removing that requirement.

Fixes #233.

cholmes commented 5 months ago

Good catch, though do we want to completely throw out the rule? Like we probably still don't want geometries nested in a group? And we could also just say that the only groupings allowed are the native geometries...

I guess with this we'll probably need a 1.1.1?

tschaub commented 5 months ago

Good catch, though do we want to completely throw out the rule? Like we probably still don't want geometries nested in a group? And we could also just say that the only groupings allowed are the native geometries...

I think this might be a case where having more language in the spec is just going to create confusion.

With the change I've proposed, our one requirement related to nesting/groups is that all geometry columns must be at the root of the schema. This implies (and is the same thing as saying) that geometry columns cannot be nested within other group fields. So I wouldn't add any additional words related to that.

The encoding section already has a lot of language about how the encoding changes the structure of the geometry field, so I'm not sure more language is needed above.

Basically, I think lots of language invites lots of differing interpretations. And with fewer words we risk fewer inconsistencies/misinterpretations/bugs etc.

I guess with this we'll probably need a 1.1.1?

Yeah, I think that makes sense.

cholmes commented 5 months ago

Cool, makes sense to me.

Let's try to get this out soon, but ideally get a few more eyes on this PR.

jorisvandenbossche commented 5 months ago

I guess with this we'll probably need a 1.1.1?

It's actually something we haven't discussed (or had to discuss) before, but do we want a new version number for something like this? Because it is not fixing something in the spec (in the sense of fixing something in implementations / files), but only fixing a wording in the spec. And getting this fixed wording out (in the rendered spec on the website) is definitely good, but do we then actually want files with a version number of "1.1.1" in their metadata?

cholmes commented 5 months ago

It's actually something we haven't discussed (or had to discuss) before, but do we want a new version number for something like this? Because it is not fixing something in the spec (in the sense of fixing something in implementations / files), but only fixing a wording in the spec.

Yeah, I think I could go either way - curious what others think. I agree it seems excessive to have implementations update just a version number when nothing really changes. On the other hand if this was written more formality like an OGC specification I do think this would be a change to the requirements / test cases / validators - indeed I think this came up since we didn't actually have any validators try to test 1.1 before we released. In the OGC version of the spec in requirement 2 it does say 'A geometry SHALL NOT be a group field or nested in a group.' So in that case it's not just wording, but it's changing one of the 'requirements'.

At this point I probably lean slightly towards finding a way to not release a new version, since we have not actually adopted the OGC style of doing things, so there's a case to be made 'it's just wording'. But I'm not sure what we actually do then? Just merge this change to 1.2.0-dev and have 1.1.0 spec not have it? Update 1.1.0 tag and the published version on https://geoparquet.org/releases/v1.1.0/ to include this change? Something else?

cholmes commented 5 months ago

I guess one option would be to cut a 1.1.1 'release' on github (so there's a tag and the website gets updated), but keep the version the same?

Could even call it 1.1.0.1 on github / spec? Or call it 1.1.0-update or something?

kylebarron commented 5 months ago

In Python there's also the idea of "post releases" https://packaging.python.org/en/latest/specifications/version-specifiers/#post-releases

Some projects use post-releases to address minor errors in a final release that do not affect the distributed software (for example, correcting an error in the release notes).

cholmes commented 5 months ago

Oh nice, I like that. Good to use some precedent.

cholmes commented 5 months ago

Merging in, since no matter what we need this wording in. I'm going to try to run with the 'post release' idea and make a branch to propose it, as I'd like to get this wrapped up soon.

tschaub commented 5 months ago

@cholmes - with Semver this would be handled with "build metadata" following the version number - so 1.1.0+P1 would be a version that would follow 1.1.0 and precede 1.2.0 (or 1.1.1 if we decided we wanted that later). The letter P is arbitrary, but we could decide that we want to use P build metadata to indicate our post-release fixes.

We could make some changes to the way the website is built to only show the most recent release considering build metadata.

Please don't follow the version naming convention from Python.

tschaub commented 5 months ago

Here is a summary of the latest tag and build metadata handling: https://github.com/opengeospatial/geoparquet/pull/236#issuecomment-2183547346