Closed cholmes closed 1 year ago
@cholmes
I've downloaded the admin data and parsed it through DuckDB
db.execute ("""
COPY (
select *
from '**/*.parquet'
WHERE adminLevel = 2
isocountrycodealpha2 is not null
) TO 'admin-countries.parquet'
""")
With this I can then convert to geoparquet using gpq.
I guess this should just work without the need to use DuckDB though?
@mtravis - funny, I just came here to make the same comment, as I had noticed that too.
Yeah, running it through DuckDB in most any way seems to work fine, so it seems to not be anything fundamental with the structure of that data.
I get an error trying to read this file using the Arrow libs directly. I've ticketed this as https://github.com/apache/arrow/issues/37968.
I'll work on trying to narrow it down.
This now works in the latest release. If using brew, you can brew update && brew install planetlabs/tap/gpq
to install the latest. And you can run gpq version
to see what version you have installed.
# the file above is now converted to valid geoparquet
gpq convert overture.parquet --to geoparquet | gpq validate
In case it is of interest to Overture users, I opened a discussion about the Parquet schema here: https://github.com/OvertureMaps/schema/discussions/55
Basically, the current schema for names
and sources
is not as specific as it could be (allowing arbitrary properties for names
for example instead of restricting it to the common
, official
, alternate
, and short
described in the JSON Schema). If you think a more specific schema would be harmful or helpful, please chime in.
The new overture maps has parquet in WKB, but when I try to convert it I get:
Sample data is at https://storage.googleapis.com/open-geodata/ch/20230725_211237_00132_5p54t_3b7d7eb3-dd9c-442a-a9b9-404dc936c5d9