planetlabs / gpq

Utility for working with GeoParquet
https://planetlabs.github.io/gpq/
Apache License 2.0
159 stars 8 forks source link

Geoparquet 1.1 validation issues with geoparquet test data. #188

Open cholmes opened 4 months ago

cholmes commented 4 months ago

We just released geoparquet 1.1, and I tried gpq validate on the test data with the native encoding, and it got a stack trace:

% gpq validate data-multilinestring-encoding_wkb.parquet
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x10 pc=0x10333abbc]

goroutine 1 [running]:
github.com/paulmach/orb/geojson.(*Geometry).Geometry(0x0?)
    /home/runner/go/pkg/mod/github.com/paulmach/orb@v0.11.0/geojson/geometry.go:49 +0x1c
github.com/planetlabs/gpq/internal/validator.(*Validator).Report(0x140008d94e0, {0x1045465a0?, 0x105580e00}, 0x140000533e0)
    /home/runner/work/gpq/gpq/internal/validator/validator.go:242 +0x1354
github.com/planetlabs/gpq/internal/validator.(*Validator).Validate(0x140007a8900?, {0x1045465a0, 0x105580e00}, {0x14cd0cb58?, 0x140004aa998?}, {0x140007a8990, 0x29})
    /home/runner/work/gpq/gpq/internal/validator/validator.go:103 +0x12c
github.com/planetlabs/gpq/cmd/gpq/command.(*ValidateCmd).Run(0x10554a058, 0x140007c3200)
    /home/runner/work/gpq/gpq/cmd/gpq/command/validate.go:47 +0x178
reflect.Value.call({0x1042c6360?, 0x10554a058?, 0x140009bfa78?}, {0x103b02567, 0x4}, {0x14000843638, 0x1, 0x1027b9dd8?})
    /opt/hostedtoolcache/go/1.21.5/x64/src/reflect/value.go:596 +0x994
reflect.Value.Call({0x1042c6360?, 0x10554a058?, 0x104273200?}, {0x14000843638?, 0x10451b840?, 0x140008d92b0?})
    /opt/hostedtoolcache/go/1.21.5/x64/src/reflect/value.go:380 +0x94
github.com/alecthomas/kong.callFunction({0x1042c6360?, 0x10554a058?, 0x0?}, 0x103b01e47?)
    /home/runner/go/pkg/mod/github.com/alecthomas/kong@v0.8.1/callbacks.go:98 +0x370
github.com/alecthomas/kong.(*Context).RunNode(0x140007c3200, 0x140008002d0, {0x140009bff08, 0x2, 0x140007c7701?})
    /home/runner/go/pkg/mod/github.com/alecthomas/kong@v0.8.1/context.go:765 +0x634
github.com/alecthomas/kong.(*Context).Run(0x104135b40?, {0x140009bff08?, 0x0?, 0x1026a9ea8?})
    /home/runner/go/pkg/mod/github.com/alecthomas/kong@v0.8.1/context.go:790 +0x138
main.main()
    /home/runner/work/gpq/gpq/cmd/gpq/main.go:32 +0x10c

It got similar results on the 'wkb' test data. But it did work just fine on the main 1.1 geoparquet example. I also generated 1.1 with arrow support gdal (just converting without arrow didn't seem to make gdal do 1.1) and it didn't stack trace, and worked as I'd expect for not being updated to 1.1:

Summary: Passed 12 checks, failed 3 checks, 5 checks not run.

 ✓ file must include a "geo" metadata key
 ✓ metadata must be a JSON object
 ✓ metadata must include a "version" string
 ✓ metadata must include a "primary_column" string
 ✓ metadata must include a "columns" object
 ✓ column metadata must include the "primary_column" name
 ✗ column metadata must include a valid "encoding" string
   ↳ unsupported encoding "point" for column "geom"
 ✓ column metadata must include a "geometry_types" list
 ✓ optional "crs" must be null or a PROJJSON object
 ✓ optional "orientation" must be a valid string
 ✓ optional "edges" must be a valid string
 ✓ optional "bbox" must be an array of 4 or 6 numbers
 ✓ optional "epoch" must be a number
 ✗ geometry columns must not be grouped
   ↳ column "geom" must not be a group
 ✗ geometry columns must be stored using the BYTE_ARRAY parquet type
   ↳ expected primitive column for "geom"

GPQ describe commands all worked well, even with arrow, which was nice.

tschaub commented 4 months ago

Thanks for the report. The v0.23.0 release fixes the panic above. With that change, gpq validate should work against GeoParquet v1.1 files using WKB.

I'll work separately on support for validating files using the new native geometry encodings.