Open rabernat opened 3 years ago
I'm sitting here with @einatlev-ldeo at the EarthCube Annual Meeting in La Jolla. We are discussing if/how we may be able to provide cloud-optimized access to (at least some subset of) the data provided on
via Pangeo Forge.
Based on our discussions, it seems that this may be a great use case for a Parquet recipe. It strikes me that once we complete the work scoped in https://github.com/pangeo-forge/pangeo-forge-recipes/issues/376, the possibility of writing a Parquet recipe is perhaps quite approachable (as really just few additional PTransforms).
While we're waiting for the first phase Beam work to complete, perhaps we can start brainstorming what data objects would make sense to assemble from these raw data. For example, are there a set(s) of variables with the same time resolution, which would be able to fit all in a single large table together.? If so, what are those variables and their access paths on the file server? Can we assemble a demonstration CSV from them using a simple standalone Python script? If so, that would be a very useful basis for building a larger table with Pangeo Forge.
Side note: there's some awesome webcam data available through the same project. I wonder what ARCO format might be suitable for webcam time series data?
Just FYI, I have some notes on how we think about tabular data for the Planetary Computer: https://gist.github.com/TomAugspurger/457a2288f6ef7490ab87546faf665e14
Thanks Tom this is great
Thank you!
Sent from my iPhone
On Jun 15, 2022, at 6:25 AM, Tom Augspurger @.***> wrote:
Just FYI, I have some notes on how we think about tabular data for the Planetary Computer: https://gist.github.com/TomAugspurger/457a2288f6ef7490ab87546faf665e14
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
So far we basically only have NetCDF (or other things that Xarray can read; e.g. Grib) to Zarr recipes.
Some recipes will want to work with tabular data, e.g. transforming a collections of CSVs to Parquet. (Example: https://github.com/pangeo-forge/staged-recipes/issues/3)
This will require an entirely new recipe class. Creating this class will force us to refactor the recipe module significantly. This will be laborious but hopefully relatively straightforward.