Open martindurant opened 3 years ago
I think this is a great pattern we should definitely work to support! 👍 These recipes will generally be a bit cheaper to run because they don't have to copy much data.
Please feel free to take a stab at implementing such a recipe class. It would be good to have an issue in staged-recipes
to point to a specific dataset we can use as a user story.
cc @tam203 - you might be interested eventually encoding your datasets into pangeo-forge recipes or, more simply, including your existing catalogue prescriptions. I have not yet had the chance to look through the code of Hypothetic in detail, to have a good model for myself of the components (filename convention versus zarr chunk key; zarr storage; intake driver; download/cache layer).
@rabernat The simplest case to encode would be the existing example in https://github.com/intake/fsspec-reference-maker/blob/main/examples/intake_catalog.yml , and the reference file specified therein. The recipe would essentially repeat that scan for the latest capabilities of fsspec-reference-maker as it evolves.
The typical workflow discussed in most other issues is around taking some dataset, and transforming it into a zarr format for storage - with a number of options about how to go about doing that.
Here I want to make note of an alternative, where the final data product will still be loading from the original source, and the output of the pangeo-forge process is a prescription for how to go about it. The principal use cases for this are:
There are two broad categories of data access considered, for now