planetlabs / gpq

Utility for working with GeoParquet
https://planetlabs.github.io/gpq/
Apache License 2.0
135 stars 7 forks source link

Accept URLs for describe, validate, and convert input #98

Closed tschaub closed 9 months ago

tschaub commented 9 months ago

This adds support for using URLs in addition to file paths as the input for the describe, validate, and convert commands.

# gpq describe https://github.com/opengeospatial/geoparquet/raw/v1.0.0/examples/example.parquet
╭────────────────────┬────────┬────────────┬────────────┬─────────────┬──────────┬───────────────────────┬───────────────────────────┬──────────────────────────╮
│ COLUMN             │ TYPE   │ ANNOTATION │ REPETITION │ COMPRESSION │ ENCODING │ GEOMETRY TYPES        │ BOUNDS                    │ DETAIL                   │
├────────────────────┼────────┼────────────┼────────────┼─────────────┼──────────┼───────────────────────┼───────────────────────────┼──────────────────────────┤
│ pop_est            │ double │            │ 0..1       │ snappy      │          │                       │                           │                          │
│ continent          │ binary │ string     │ 0..1       │ snappy      │          │                       │                           │                          │
│ name               │ binary │ string     │ 0..1       │ snappy      │          │                       │                           │                          │
│ iso_a3             │ binary │ string     │ 0..1       │ snappy      │          │                       │                           │                          │
│ gdp_md_est         │ int64  │            │ 0..1       │ snappy      │          │                       │                           │                          │
│ geometry           │ binary │            │ 0..1       │ snappy      │ WKB      │ Polygon, MultiPolygon │ [-180, -90, 180, 83.6451] │  edges │ planar          │
│                    │        │            │            │             │          │                       │                           │  crs   │ WGS 84 (CRS84)  │
├────────────────────┼────────┴────────────┴────────────┴─────────────┴──────────┴───────────────────────┴───────────────────────────┴──────────────────────────┤
│ Rows               │ 5                                                                                                                                        │
│ Row Groups         │ 1                                                                                                                                        │
│ GeoParquet Version │ 1.0.0                                                                                                                                    │
╰────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

You can also validate given an input URL (with or without the --metadata-only flag, with the flag skips scanning all geometries):

# gpq validate https://github.com/opengeospatial/geoparquet/raw/v1.0.0/examples/example.parquet --metadata-only
Summary: Passed 16 checks.

Metadata and schema checks only.  Skipped 4 data scanning checks.

 ✓ file must include a "geo" metadata key
 ✓ metadata must be a JSON object
 ✓ metadata must include a "version" string
 ✓ metadata must include a "primary_column" string
 ✓ metadata must include a "columns" object
 ✓ column metadata must include the "primary_column" name
 ✓ column metadata must include a valid "encoding" string
 ✓ column metadata must include a "geometry_types" list
 ✓ optional "crs" must be null or a PROJJSON object
 ✓ optional "orientation" must be a valid string
 ✓ optional "edges" must be a valid string
 ✓ optional "bbox" must be an array of 4 or 6 numbers
 ✓ optional "epoch" must be a number
 ✓ geometry columns must not be grouped
 ✓ geometry columns must be stored using the BYTE_ARRAY parquet type
 ✓ geometry columns must be required or optional, not repeated

And the same works for the convert command:

gpq convert https://github.com/opengeospatial/geoparquet/raw/v1.0.0/examples/example.parquet example.geojson

This doesn't yet add support for reading from blob storage. I'll add that separately.

Fixes #93.