pangeo-forge / user-stories

User stories to guide PF development
1 stars 0 forks source link

Security & importing contributed recipes #11

Open cisaacstern opened 2 years ago

cisaacstern commented 2 years ago

User Profile

As a project owner

User Action

I want to reach a consensus with other project owners regarding best security practices for importing contributed recipes

User Goal

So that I know what security guardrails to observe while to developing new features on Pangeo Forge Cloud

Acceptance Criteria

An internal document and/or mutual understanding regarding best practices for importing nominally "untrusted" recipe modules. More details regarding motivating cases in Linked Issues section below.

Linked Issues

By way of background, there are two currently two places in the Registrar where we automatically create recipe runs in response to a push event:

  1. For recipes in a PR commit
  2. For recipes pushed to the default branch of a merged feedstock

In the second case, we can assume some Pangeo Forge maintainer (either a project owner or the maintainer of a feedstock) has looked at the code already. There may be risks here due to inattentiveness, etc. but we can leave those for another day.

What I'd like to discuss here is first case, wherein the submitted code is truly untrusted in the sense that literally anyone in the whole world can make a PR to /staged-recipes, and if it has a properly formatted and complete meta.yaml, then recipe runs will be created for all recipes listed in the meta.yaml. For this reason, I've assumed thus far that we should never actually import the recipe module when automatically creating recipe runs, and that is how the Registrar currently operates.

Certain open User Stories challenge this model, however. Namely:

In both of these cases, without importing the recipe module, we don't have enough information to create recipe runs. Specifically, as https://github.com/pangeo-forge/user-stories/issues/3 is currently conceived, to determine whether or not to re-run a given recipe we would need to call self.sha256() on each of the recipes, in order to compare the resulting hashes to those of the prior run (if any) for the recipe. If the hashes match, we wouldn't create recipe runs at all. And for https://github.com/pangeo-forge/user-stories/issues/10, we wouldn't know the names of the individual recipes within a dict_object without importing the recipe module and introspecting the specified dictionary.

Both of these User Stories have real, already-existing contributors that would like to use them, and from a design perspective would be big improvements to the platform. They would also be specifically useful for the low trust case of creating recipe runs for PRs, so simply saying "we don't support these features on PRs" seems far from ideal.

A few further questions/possibilities to kick off discussion:

cisaacstern commented 2 years ago

Thanks @jbusecke for the helpful suggestion of starting by enabling #10 just for members of the pangeo-forge GitHub org. This seems like a good way to get some use out of these features and see how they perform, without viewing the issue discussed here to be a fundamental blocker.