opengeospatial / geoparquet

Specification for storing geospatial vector data (point, line, polygon) in Parquet
https://geoparquet.org
Apache License 2.0
780 stars 54 forks source link

nz-buildings-outlines.parquet sample file uses 'schema_version' instead of 'version' #42

Closed rouault closed 2 years ago

rouault commented 2 years ago

https://storage.googleapis.com/open-geodata/linz-examples/nz-buildings-outlines.parquet has the following 'geo' metadata value:

{
  "primary_column": "geometry",
  "columns": {
    "geometry": {
      "crs": "PROJCRS[\"NZGD2000 / New Zealand Transverse Mercator 2000\",BASEGEOGCRS[\"NZGD2000\",DATUM[\"New Zealand Geodetic Datum 2000\",ELLIPSOID[\"GRS 1980\",6378137,298.257222101,LENGTHUNIT[\"metre\",1]]],PRIMEM[\"Greenwich\",0,ANGLEUNIT[\"degree\",0.0174532925199433]],ID[\"EPSG\",4167]],CONVERSION[\"New Zealand Transverse Mercator 2000\",METHOD[\"Transverse Mercator\",ID[\"EPSG\",9807]],PARAMETER[\"Latitude of natural origin\",0,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8801]],PARAMETER[\"Longitude of natural origin\",173,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8802]],PARAMETER[\"Scale factor at natural origin\",0.9996,SCALEUNIT[\"unity\",1],ID[\"EPSG\",8805]],PARAMETER[\"False easting\",1600000,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8806]],PARAMETER[\"False northing\",10000000,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8807]]],CS[Cartesian,2],AXIS[\"northing (N)\",north,ORDER[1],LENGTHUNIT[\"metre\",1]],AXIS[\"easting (E)\",east,ORDER[2],LENGTHUNIT[\"metre\",1]],USAGE[SCOPE[\"Engineering survey, topographic mapping.\"],AREA[\"New Zealand - North Island, South Island, Stewart Island - onshore.\"],BBOX[-47.33,166.37,-34.1,178.63]],ID[\"EPSG\",2193]]",
      "encoding": "WKB",
      "bbox": [
        1167512.311218509,
        4794679.949937864,
        2089113.650566361,
        6190596.90070761
      ]
    }
  },
  "schema_version": "0.1.0",
  "creator": {
    "library": "geopandas",
    "version": "0.10.2"
  }
}

schema_version should be renamed as version

jorisvandenbossche commented 2 years ago

Ah, good catch .. (and a "damn it!" for myself). The file was created using the released geopandas (which already supported those parquet files), and I thought I checked everything that the we were compatible with the updated spec, but I missed the "schema_version" vs "version".

That's unfortunate (if I had noticed this before, I might have argued to use "schema_version" in the spec in this repo as well ..), so that means existing geopandas-written files will have the required "version" field missing.

@cholmes I will provide an updated file.

cholmes commented 2 years ago

@jorisvandenbossche - I'd be ok to switch to schema_version if you wanted. Like we could cut a 0.2.0 release. Though I suppose that doesn't actually help the backwards compatibility with geopandas, since it'd be a new version number. But it could just be cleaner all around?

jorisvandenbossche commented 2 years ago

I was first thinking to propose that as well, but it would indeed not actually solve the issue if there is a reader that very strictly checks this.

If we cut a 0.2.0 release with "schema_version", reader implementations would still need to be flexible regarding a missing "version" field for 0.1.0 files written with released geopandas. If we don't change this, reader implementations will also need to be flexible regarding a missing "version" field for them to be able to read those old geopandas files.

So maybe the main advantage of doing a more rapid 0.2.0 release, is that it might result in implementations directly supporting that version (and less 0.1.0 files getting written). And that it gives a clear version bump for geopandas to adjust the metadata (I was now planning to do a quick single-patch release of geopandas to change "schema_version" to "version", but that would still result in some 0.1.0 files written by geopandas with the old field and some with the new field).

And if we do a quick 0.2.0 release, I think we can just choose whatever of "version" or "schema_version" that we think is the best name (as it doesn't matter that much for the compatibility story for 0.1.0 files)

cholmes commented 2 years ago

So maybe the main advantage of doing a more rapid 0.2.0 release, is that it might result in implementations directly supporting that version (and less 0.1.0 files getting written). And that it gives a clear version bump for geopandas to adjust the metadata (I was now planning to do a quick single-patch release of geopandas to change "schema_version" to "version", but that would still result in some 0.1.0 files written by geopandas with the old field and some with the new field).

Yeah, that's what I was thinking - have the 0.2.0 instead of the quick single-patch release.

I also think that most readers won't need to strictly check the super early versions of the spec. They'll just not support it, and then that will be a push for anyone with data in older versions to upgrade. And it really is all just announced, with warnings that it may change, so I think it's unlikely there's very much data at all.

Jesus89 commented 2 years ago

I've added a branch with the rename, just in case we want to change it for future versions: https://github.com/opengeospatial/geoparquet/commit/9d94152c6d0bba8f181efcb5377dcb741a4b63b7

cholmes commented 2 years ago

I think we've maybe missed the window for a 'quick release' of 0.2.0?

If Joris has a good reason for schema_version other than just the backwards compatibility (which we didn't quite get right) then I'm open to it, but just 'version' under the 'geo' key seems clear enough to me, and given a choice I prefer shorter. But really don't feel strongly on this one. Just want to be sure we're changing for a good reason.

Jesus89 commented 2 years ago

Agree, if there is no good reason we can keep it as it is and update the files. @jorisvandenbossche what do you think about it?

TomAugspurger commented 2 years ago

I’m happy to keep it as is, since I already have data written with “version” and I don’t think “schema_version” adds much over version.

cholmes commented 2 years ago

Closing this, as the new nz-buildings-outlines sample now uses version, with 0.3

evetion commented 2 years ago

@cholmes Can you point to this v0.3 nz-buildings-outlines sample? The link on the README still points to the 0.1 version.