planetlabs / gpq

Utility for working with GeoParquet
https://planetlabs.github.io/gpq/
Apache License 2.0
135 stars 7 forks source link

Not able to convert geojson files when the schema is not inferrable #182

Open clembou opened 4 weeks ago

clembou commented 4 weeks ago

Hi,

I am experiencing this issue with gpq:

gpq: error: failed to create schema after reading 39 features

Based on https://github.com/planetlabs/gpq/issues/142 the answer is clear: there are no non null values in any of the features for one of the columns. Indeed, if I edit the file and add just one everything works fine.

The problem is that unlike in the linked issue it is not possible for me to increase the amount of rows scanned because all the rows have nulls, and this is a case that is pretty common with the files I am dealing with.

While this strict behaviour is understandableby default, is is preventing me from adopting the tool. The ogr2ogr behaviour is maybe questionable (in my case the incriminating column is being added as a string instead of an int), it at least produces an output that is usable.

So perhaps an option to --drop-non-inferrable-columns, or --import-ambiguous-columns-as-strings would be a useful escape hatch for gpq users. (pre-processing json is of course an option too but more invovled)

tschaub commented 3 weeks ago

I agree that there should be a way to handle this. OGR will encode these “unknown” types as JSON strings.

In your case, I imagine you would want an optional integer type. The challenge is coming up with syntax for the command line args that is convenient and flexible. Referring to a secondary file with schema or other complex options might be nicer.

clembou commented 3 weeks ago

Being able to specify the schema would be nice for a perfect output yes. I think even just optionally treating unknown fields as strings or dropping non inferable columns altogether would be a nice improvement though (and acceptable for my personal use case)