planetlabs / gpq

Utility for working with GeoParquet
https://planetlabs.github.io/gpq/
Apache License 2.0
151 stars 8 forks source link

Suggest running validate if metadata parsing fails in describe #90

Closed tschaub closed 1 year ago

tschaub commented 1 year ago

This adds output to the describe command that suggests running validate if the geo metadata is invalid.

Example output:

# gpq describe invalid.geoparquet
╭──────────┬────────┬────────────┬────────────┬─────────────╮
│ COLUMN   │ TYPE   │ ANNOTATION │ REPETITION │ COMPRESSION │
├──────────┼────────┼────────────┼────────────┼─────────────┤
│ geoid    │ binary │ string     │ 0..1       │ snappy      │
│ geometry │ binary │            │ 0..1       │ snappy      │
├──────────┼────────┴────────────┴────────────┴─────────────┤
│ Rows     │ 3233                                           │
╰──────────┴────────────────────────────────────────────────╯
 ⚠️  Not a valid GeoParquet file (invalid "geo" metadata). Run describe with the --metadata-only flag 
to see the "geo" metadata value. Run validate for more detail on validation issues.

In addition, this change adds a suggestion in the describe output to run validate if the file is missing geo metadata altogether.

Example output:

# gpq describe not-geo.parquet
╭───────────────────┬────────┬────────────┬────────────┬──────────────╮
│ COLUMN            │ TYPE   │ ANNOTATION │ REPETITION │ COMPRESSION  │
├───────────────────┼────────┼────────────┼────────────┼──────────────┤
│ registration_dttm │ int96  │            │ 0..1       │ uncompressed │
│ id                │ int32  │            │ 0..1       │ uncompressed │
│ first_name        │ binary │ string     │ 0..1       │ uncompressed │
│ last_name         │ binary │ string     │ 0..1       │ uncompressed │
│ email             │ binary │ string     │ 0..1       │ uncompressed │
│ gender            │ binary │ string     │ 0..1       │ uncompressed │
│ ip_address        │ binary │ string     │ 0..1       │ uncompressed │
│ cc                │ binary │ string     │ 0..1       │ uncompressed │
│ country           │ binary │ string     │ 0..1       │ uncompressed │
│ birthdate         │ binary │ string     │ 0..1       │ uncompressed │
│ salary            │ double │            │ 0..1       │ uncompressed │
│ title             │ binary │ string     │ 0..1       │ uncompressed │
│ comments          │ binary │ string     │ 0..1       │ uncompressed │
├───────────────────┼────────┴────────────┴────────────┴──────────────┤
│ Rows              │ 1000                                            │
╰───────────────────┴─────────────────────────────────────────────────╯
 ⚠️  Not a valid GeoParquet file (missing the "geo" metadata key). Run convert to try to convert it 
to GeoParquet.

Fixes #87.