Seems like it might be time to start a 'best practices' document for topics that are outside the spec but would be good for people to know about.
Remembered this when reading #79.
Potential ideas to include:
What compression to use (zstd, snappy, brotli, etc). Talk through how not all parquet implementations support all compressions, and also how to think the compression time vs file size tradeoff. Perhaps some discussion of what works best for geospatial / common geo use cases.
Discussion of spatial ordering - like explain how the bbox column works best when you've used a r-tree or something else to sort your data, point at what different implementations do, etc. Makes sense to keep the spec barebones and flexible, but nice to provide more explanation guidance for those who are making datasets.
Partitioning - we need to figure out the _metadata files in #79, and a best practices doc likely makes sense. But also just a more general discussion of when to split up parquet files, and things to consider when splitting them up - admin boundaries vs bbox vs ...
The filename extension recommendation (#212) arguably would fit in a best practice (though I think in the spec is fine).
Other suggestions here are welcome. I'm not the expert on these, but happy to take a crack at drafting something that others could improve.
Seems like it might be time to start a 'best practices' document for topics that are outside the spec but would be good for people to know about.
Remembered this when reading #79.
Potential ideas to include:
_metadata
files in #79, and a best practices doc likely makes sense. But also just a more general discussion of when to split up parquet files, and things to consider when splitting them up - admin boundaries vs bbox vs ...The filename extension recommendation (#212) arguably would fit in a best practice (though I think in the spec is fine).
Other suggestions here are welcome. I'm not the expert on these, but happy to take a crack at drafting something that others could improve.