pangeo-forge / pangeo-forge-recipes

Python library for building Pangeo Forge recipes.
https://pangeo-forge.readthedocs.io/
Apache License 2.0
126 stars 54 forks source link

metadata-only outputs #70

Open martindurant opened 3 years ago

martindurant commented 3 years ago

The typical workflow discussed in most other issues is around taking some dataset, and transforming it into a zarr format for storage - with a number of options about how to go about doing that.

Here I want to make note of an alternative, where the final data product will still be loading from the original source, and the output of the pangeo-forge process is a prescription for how to go about it. The principal use cases for this are:

There are two broad categories of data access considered, for now

rabernat commented 3 years ago

I think this is a great pattern we should definitely work to support! 👍 These recipes will generally be a bit cheaper to run because they don't have to copy much data.

Please feel free to take a stab at implementing such a recipe class. It would be good to have an issue in staged-recipes to point to a specific dataset we can use as a user story.

martindurant commented 3 years ago

cc @tam203 - you might be interested eventually encoding your datasets into pangeo-forge recipes or, more simply, including your existing catalogue prescriptions. I have not yet had the chance to look through the code of Hypothetic in detail, to have a good model for myself of the components (filename convention versus zarr chunk key; zarr storage; intake driver; download/cache layer).

@rabernat The simplest case to encode would be the existing example in https://github.com/intake/fsspec-reference-maker/blob/main/examples/intake_catalog.yml , and the reference file specified therein. The recipe would essentially repeat that scan for the latest capabilities of fsspec-reference-maker as it evolves.