A Prototype feedstock that implements independent metadata and data updates using pangeo forge
feedstock/recipe.py
to build your pangeo-forge recipe. If you are new to pangeo-forge, the docs are a great starting point/feedstock/
directory. More info on feedstock structure can be found hereBefore we run your recipe on LEAPs Dataflow runner you should test your recipe locally.
You can do that on the LEAP-Pangeo Jupyterhub or your own computer.
Set up an environment with mamba or conda:
mamba create -n runner0102 python=3.11 -y
conda activate runner0102
pip install pangeo-forge-runner==0.10.2 --no-cache-dir
You can now use pangeo-forge-runner from the root directory of this repository in the terminal:
pangeo-forge-runner bake \
--repo=./ \
--ref=main \
--feedstock-subdir='feedstock' \
--Bake.job_name=<recipe_id>\
--Bake.recipe_id=<recipe_id>\
-f config_local.py
[!NOTE] Make sure to replace the
'recipe_id'
with the one defined in yourfeedstock/meta.yaml
file.If you created multiple recipes you have to run a call like above for each one.
[!TIP] The above command will by default 'prune' the recipe, meaning it will only use two of the input files you provided to avoid creating too large output. Keep that in mind when you check the output for correctness.
Once you are happy with the output it is time to commit your work to git, push to github and get this recipe set up for ingestion using Google Dataflow
Pre-Commit linting is already pre-configured in this repository. To run the checks locally simply do:
pre-commit install
pre-commit run --all-files
Then create a new branch and add those fixes (and others that were not able to auto-fix). From now on pre-commit will run checks after every commit.
Alternatively (or additionally) you can use the pre-commit CI Github App to run these checks as part of every PR.
To proceed with this step you will need assistance a memeber of the LEAP Data and Computation Team. Please open an issue on this repository and tag @leap-stc/data-and-compute
and ask for this repository to be added to the pre-commit.ci app.
[!WARNING] To proceed with this step you will need to have certain repository secrets set up. For security reasons this should be done by a memeber of the LEAP Data and Computation Team. Please open an issue on this repository and tag
@leap-stc/data-and-compute
to get assistance.
To deploy a recipe to Google Dataflow you have to trigger the "Deploy Recipes to Google Dataflow" with a single recipe_id
as input.
Now that your awesome dataset is available as an ARCO zarr store, you should make sure that everyone else at LEAP can check this dataset out easily.
TBW: Instructions how to edit feedstock/catalog.yaml