Closed bdon closed 10 months ago
Hey @bdon - nice idea. I put together #79 to make all the commands optionally work with stdin/stdout.
If you omit the output
arg in the convert
command, it writes to stdout. Not as explicit as a --stdout
arg. Hopefully isn't trying to be too tricky.
All together!
curl https://data.source.coop/cholmes/google-open-buildings/geoparquet-admin1/country=EGY/Cairo_Governorate.parquet | ./gpq convert --from=geoparquet --to=geojson | tippecanoe -o buildings.pmtiles --force --drop-densest-as-needed
Included in the v0.15.0 release (brew update && brew install planetlabs/tap/gpq
or download from the release page).
@bdon - you'll probably notice that this needs to buffer the whole file since the Parquet metadata is in the footer. But that suggests another enhancement - to accept a URL for the input. Then if ranged reads are supported, the metadata could be read first (and then maybe only buffer one data page at a time).
@tschaub have you looked into using https://gocloud.dev for reading Parquet?
For https://github.com/protomaps/go-pmtiles/blob/main/pmtiles/extract.go#L276 I use only the blob functionality, but that means it supports GCP, Azure, and S3-compatible blob storage with credentials out of the box. I had to add a layer of abstraction to handle public unauthenticated HTTP URLs but it was otherwise simple.
I've used similar libs, but not yet gocloud.dev, will check it out.
My ideal would be a multi-cloud blob reader that implemented io.ReadSeeker
and io.ReaderAt
(I know this isn't efficient for all providers, but it is possible - with lots of guessing to know how much to buffer for the seeker reads).
For PMTiles it uses bucket.NewRangeReader
without any guessing - it downloads the entire (compressed) relevant part of the index in advance, and then pre-merges request ranges to avoid thousands of small requests, before fetching any actual "features" (tiles).
Is a similar batching behavior needed to be effective for geoparquet? I haven't delved deeply into actual reader implementations yet.
I'd like to do something like this:
gpq convert Cairo_Governorate.parquet --stdout --to=geojson | tippecanoe -o Cairo_Governorate.pmtiles --drop-densest-as-needed
Would this functionality be useful? It would require some changes in
convert.go
to allow for a blank positional output argument.