planetlabs / gpq

Utility for working with GeoParquet
https://planetlabs.github.io/gpq/
Apache License 2.0
144 stars 8 forks source link

More info about the data? #26

Closed cholmes closed 11 months ago

cholmes commented 1 year ago

The new gpq validation is awesome, but it'd be nice if it was easy to get a few more bits of info:

I could see two routes for this:

1) Report them as you're doing validation. Like it'd say something in the data section about how many features it's validated to all have proper info. And then instead of just 'all geometry types must be included in geometry_types metadata' it could say 'geometry type metadata is Polygon, and all geometries are polygons'. And similar with bounding box - report the bounding box and report if all fall in it.

(This does highlight two potential 'warnings' - if the bbox reported is much bigger than the actual bounds of the geometry, and if the geometry types is more flexible than needed - like it isn't specified but all the data is actually Polygons. Ideally there'd be nice quick operations in gpq to fix this.

2) Have an 'info' command like ogrinfo, that just reports on this info.

tschaub commented 1 year ago

I think it could make sense to build on the existing describe command for some of this (reporting row count etc). The current output is JSON, but we could accept a —format argument and have text output as well (this could also be the default).

The warnings about an overly large bbox or a larger set of geometry types than used do make sense in the validator.

cholmes commented 1 year ago

Cool, that'd make sense to me. Though ideally with an option to turn off the description of all the columns, sometimes there's a ton of them and I just want an overview of the info.

tschaub commented 11 months ago

The default output format from the gpq describe command is now a more compact table (the --format json output includes more information for nested fields). This includes geometry types and bbox if it is present for geometry columns in the geo metadata. I didn't change anything about the validator output. Maybe we can open individual issues about adding some "best practices" rules.