planetlabs / gpq

Utility for working with GeoParquet
https://planetlabs.github.io/gpq/
Apache License 2.0
155 stars 8 forks source link

Question: Is there is a plan to expose functionality as library code? #113

Open thisisaaronland opened 12 months ago

thisisaaronland commented 12 months ago

Hi,

I am interested in using gpq to generate GeoParquet files for Who's On First (WOF) data. Ideally I would like to do that by reading and writing data on a per-record basis rather than starting with a single GeoJSON file.

Poking through the code it appears I can stream data to gpq via STDIN which would allow me using a similar approach to how we derive PMTiles from WOF data.

That would solve me immediate problem but the functionality, specifically the convert functionality, wrapped by the gpq command would be generally useful to have a library code (outside of internal).

tschaub commented 12 months ago

Hi @thisisaaronland - thanks for reaching out about this. Yes, I think it makes sense to expose packages with functions for generating GeoParquet data.

If you have ideas about the ideal API that you'd like to use, maybe you can drop them here and we can discuss. I'm curious in particular about whether you would want to provide an Parquet (or Arrow) schema up front or if you would like this to be derived from the data.

thisisaaronland commented 12 months ago

Hi @tschaub

For starters I am not super knowledgeable about Parquet or Arrow but I have been watching the conversations around geoparquet and so this was an exercise to start getting more familiar and to prove that WOF data could be bundled in a new format. (One of the unofficial mottoes of the WOF project is: We don't need to have an opinion about your database :-)

The first thing I'd like to be able to is write a go-writer-geoparquet package that implements to whosonfirst/go-writer.Writer interface:

https://pkg.go.dev/github.com/whosonfirst/go-writer#Writer

A concrete example of that would be the go-writer-geojson package:

https://github.com/whosonfirst/go-writer-featurecollection/blob/main/featurecollection.go

That would allow me to continue to use a common sets of interfaces for writing WOF documents to a variety of targets and encapsulate all the Parquet/Arrow specific details in the constructor and the URI used to create it.

Based on the short amount of time I've spent spelunking through the gpq code it seems like just making the internal/geo* packages public might be enough.

thisisaaronland commented 5 months ago

Hi,

Just checking about this. Has there been any (more) thought about exposing the code in internal as public library code?

tschaub commented 5 months ago

I haven't made any time for this yet unfortunately.