Closed cholmes closed 1 year ago
This should be addressed in the latest release (0.19.0).
I grabbed some random userdata1.parquet
and tried this:
# gpq describe userdata1.parquet
╭───────────────────┬────────┬────────────┬────────────┬──────────────╮
│ COLUMN │ TYPE │ ANNOTATION │ REPETITION │ COMPRESSION │
├───────────────────┼────────┼────────────┼────────────┼──────────────┤
│ registration_dttm │ int96 │ │ 0..1 │ uncompressed │
│ id │ int32 │ │ 0..1 │ uncompressed │
│ first_name │ binary │ string │ 0..1 │ uncompressed │
│ last_name │ binary │ string │ 0..1 │ uncompressed │
│ email │ binary │ string │ 0..1 │ uncompressed │
│ gender │ binary │ string │ 0..1 │ uncompressed │
│ ip_address │ binary │ string │ 0..1 │ uncompressed │
│ cc │ binary │ string │ 0..1 │ uncompressed │
│ country │ binary │ string │ 0..1 │ uncompressed │
│ birthdate │ binary │ string │ 0..1 │ uncompressed │
│ salary │ double │ │ 0..1 │ uncompressed │
│ title │ binary │ string │ 0..1 │ uncompressed │
│ comments │ binary │ string │ 0..1 │ uncompressed │
├───────────────────┼────────┴────────────┴────────────┴──────────────┤
│ Rows │ 1000 │
│ Row Groups │ 1 │
╰───────────────────┴─────────────────────────────────────────────────╯
⚠️ Not a valid GeoParquet file (missing the "geo" metadata key). Run convert to try to convert it to GeoParquet.
So then I tried to convert
it:
# gpq convert userdata1.parquet maybe-geo.parquet
gpq: error: expected a geometry column named "geometry", use the --input-primary-column to supply a different primary geometry
And then followed the suggestion to try --input-primary-column
:
# gpq convert userdata1.parquet maybe-geo.parquet --input-primary-column first_name
gpq: error: wkt: unsupported geometry
All that is expected (the first_name
is not WKT or WKB). As described in https://github.com/planetlabs/gpq/issues/87#issuecomment-1747587946, this unfortunately would have worked with a non-string binary column (trusting that the data was WKB). But then validate
would fail.
A --strict
option could be added that either applied validation while writing or validated after writing in the convert
command. But that would be kind of involved.
I was checking a few files to see if they were compliant, but wasn't looking super closely and did
convert
with one that had no geometries in it. GPQ happily converted it, and then 'describe' showed:The 1.0.0 version threw me off a bit. I think it's technically valid in the spec, and looks like gpq writes out metadata, but not sure if we should call a parquet file without geometries 1.0.0.
The file does not validate:
It could be nice to do a 'has geometry column' check first, and just inform people that the data their validating does not have a geometry.
It also might be nice to put in some 'warning' when you try to convert a file that does not have a geometry. Or could even say it's not allowed (maybe allow some
force
) option.Anyways, I think the situation is ok now, but we could likely help people a bit more. I think we're going to see awhile where there's parquet files that aren't geoparquet, and it'd be nice to help people along.