Security & importing contributed recipes

User Profile

As a project owner

User Action

I want to reach a consensus with other project owners regarding best security practices for importing contributed recipes

User Goal

So that I know what security guardrails to observe while to developing new features on Pangeo Forge Cloud

Acceptance Criteria

An internal document and/or mutual understanding regarding best practices for importing nominally "untrusted" recipe modules. More details regarding motivating cases in Linked Issues section below.

Linked Issues

By way of background, there are two currently two places in the Registrar where we automatically create recipe runs in response to a push event:

For recipes in a PR commit
For recipes pushed to the default branch of a merged feedstock

In the second case, we can assume some Pangeo Forge maintainer (either a project owner or the maintainer of a feedstock) has looked at the code already. There may be risks here due to inattentiveness, etc. but we can leave those for another day.

What I'd like to discuss here is first case, wherein the submitted code is truly untrusted in the sense that literally anyone in the whole world can make a PR to /staged-recipes, and if it has a properly formatted and complete meta.yaml, then recipe runs will be created for all recipes listed in the meta.yaml. For this reason, I've assumed thus far that we should never actually import the recipe module when automatically creating recipe runs, and that is how the Registrar currently operates.

Certain open User Stories challenge this model, however. Namely:

In both of these cases, without importing the recipe module, we don't have enough information to create recipe runs. Specifically, as https://github.com/pangeo-forge/user-stories/issues/3 is currently conceived, to determine whether or not to re-run a given recipe we would need to call self.sha256() on each of the recipes, in order to compare the resulting hashes to those of the prior run (if any) for the recipe. If the hashes match, we wouldn't create recipe runs at all. And for https://github.com/pangeo-forge/user-stories/issues/10, we wouldn't know the names of the individual recipes within a dict_object without importing the recipe module and introspecting the specified dictionary.

Both of these User Stories have real, already-existing contributors that would like to use them, and from a design perspective would be big improvements to the platform. They would also be specifically useful for the low trust case of creating recipe runs for PRs, so simply saying "we don't support these features on PRs" seems far from ideal.

A few further questions/possibilities to kick off discussion:

Is there some importlib equivalent to yaml.safe_load which might be useful in this case?
One obvious option is to require a maintainer's approval to create recipe runs (rather than generating them automatically), but this feels (1) very un-ergonomic and tedious; (2) actually not that safe, because maintainers juggling lots of other tasks could potentially be fooled with phishing-style slight typos on import paths or the like.

pangeo-forge / user-stories