Github Trust Levels - Githubissues

ahal commented 11 months ago

In Github we only use two trust levels:

L1 for github-pull-request (and also everything else if the overall project is level 1) L3 for github-push, action and cron

L2 is unused. There are some problems with this setup.

First, there are actually four levels of trust in our Github projects:

Pull request from external contributor
Pull request from collaborator (has at least triage permission)
Push to unprotected branch (has at least write permission)
Push to protected branch (has passed all branch protection rules or is admin)

These don't get differentiated with the level system, which means we need to be careful in our ci-config grants as we can't rely on the level to provide security guarantees. For example, we must only grant secrets to the pull-request:trusted job and not pull-request:untrusted. Messing this up could leak a secret.

A more pressing factor is the Gecko migration to Github and what to do about project twigs. We could simply create a separate fork for each twig and assign the overall project as level 2 (like we do now), but it would be more convenient if folks could simply create their own branches on the main repo and have them default to level 2. I.e, the project would have per-branch levels instead of a project wide level.

So my ideal end state here is the following:

L1 is used for pull requests from external contributors
L2 is used for both pull requests from collaborators and pushes to unprotected branches (this recognizes that both these cases come from a trusted source, but the changes are not necessarily reviewed / valid)
L3 is used for pushes to protected branches

When working through the implementation, I ran into some troubles, so I'm hoping we can hash out what the implementation might look like and decide whether an RFC for this is feasible or not.

ahal commented 11 months ago

This comment explains the implementation I was thinking of, and the problems I ran into.

At a high level, ci-config and the project repos need to agree on what level a branch is. This could be accomplished by defaulting pushes to all branches (including main) to L2, but then explicitly listing the branches that should be L3 (if any). In ci-config this would be listed in the projects.yml, in repo this would be listed in taskcluster/config.yml (note level would no longer be set in .tc.yml here so a Taskgraph change is required).

So far so good, back to ci-config. When generating the scopes for any branch:<name> roles, ci-config would consult the branch config for the project. If the named branch should be level 3, it sets the substitution rule to level 3. If the branch isn't there or a wildcard is used, it sets level to 2. Also so far so good.

The problem is what to do about action and cron. Currently these have no branch association. I think we'd need actions / cron tasks that run on L3 branches to be L3. But if they run on L2 branches they should be L2. So we somehow need to tie actions / cron to branches and this is where I kind of gave up.

Perhaps, we could get rid of the action/cron jobs in ci-config and instead create branch:<name>:action:<name> roles, but that might be a lot of additional complexity in our grants!

Ideas and thoughts are welcome!

ahal commented 11 months ago

Or maybe we remove actions / cron from the jobs key:

- grant:
     - ...
  to:
    - project:
        type: action:*
        job: [branch:<name>]

(really stretching here)

bhearsum commented 11 months ago

Thanks for filing this and all the initial thinking through! I have some thoughts!

A few high level comments first:

Should add some gaps between the existing levels (eg: use L1, L5, and L10) to futureproof a bit? It feels almost silly, but I can imagine that we might, eg: separate out trusted PRs from unprotected branches at some point, and with the current L1/2/3 proposal that would be difficult to do.
I think we will need cron support on multiple branches (to support things like beta and release branches for Gecko). I don't think we've ever had cron on multiple branches in the past - so there may be some unexpected issues here.
I also think that we will need to support actions in PRs (most notably for reruns) when Gecko migrates to GitHub. This doesn't necessarily have to happen as part of whatever comes out of this, but we should keep it in mind. (I know you're touching on this a bit already.)

A more pressing factor is the Gecko migration to Github and what to do about project twigs. We could simply create a separate fork for each twig and assign the overall project as level 2 (like we do now), but it would be more convenient if folks could simply create their own branches on the main repo and have them default to level 2. I.e, the project would have per-branch levels instead of a project wide level.

I agree that we should avoid forks -- the benefits of completely self serve twigs / working branches are high enough that unless there are very strong reasons that we can't make this happen, we should strive for it. (And it may even be the case that other stakeholders may shoot down the idea of forks altogether.)

L2 is used for both pull requests from collaborators and pushes to unprotected branches (this recognizes that both these cases come from a trusted source, but the changes are not necessarily reviewed / valid)

I think this sounds very reasonable. It's notable that the uses cases for PRs and branches are different (branches are usually much more long lived, and may want additional things such as updates enabled), but I don't think that changes anything from a permissions/scopes standpoint.

When generating the scopes for any branch: roles, ci-config would consult the branch config for the project.

I assume that "branch config" in this context is something in ci-config (and that we wouldn't be leaving the repo in control of this?).

The problem is what to do about action and cron. Currently these have no branch association. I think we'd need actions / cron tasks that run on L3 branches to be L3. But if they run on L2 branches they should be L2. So we somehow need to tie actions / cron to branches and this is where I kind of gave up. Perhaps, we could get rid of the action/cron jobs in ci-config and instead create branch::action: roles, but that might be a lot of additional complexity in our grants!

We'd also need something like pull-request:action:<name>, I think.

And this might simply be necessarily complexity at this point...so maybe we shouldn't shy away from it? I'm struggling to come up with any other ideas that would allow actions and cron to be tied to branches and PRs.

ahal commented 11 months ago

Thanks for the comments!

I agree that we should avoid forks -- the benefits of completely self serve twigs / working branches are high enough that unless there are very strong reasons that we can't make this happen, we should strive for it. (And it may even be the case that other stakeholders may shoot down the idea of forks altogether.)

I agree, it doesn't seem insurmountable.. but we should have buy in and SecOps approval before we go ahead with implementation. I wonder what the current plan is for twigs (if any).. I'll touch base with glob.

I assume that "branch config" in this context is something in ci-config (and that we wouldn't be leaving the repo in control of this?).

Yeah, so in projects.yml you might have:

firefox-android:
    level-3-branches:
        - main
        - releases-*

Of course Taskgraph also needs to know what level a branch is, so it will need a duplicate copy of this in config.yml. But if Taskgraph's copy gets out of sync, it would cause a scope error rather than tasks to run where they shouldn't.

hwine commented 11 months ago

[Hal thinks some of his thoughts on terminology also apply here.]

bhearsum commented 7 months ago

Another interesting case came up in https://github.com/mozilla/firefox-translations-training/issues/375 where we want everything that L3 implies when it comes to scriptworkers (eg: CoT verification), but due to Translations needing live logs and interactive tasks enabled, @hwine/SecOps are not comfortable calling the production workers "level 3". We've been talking about using L2 for this, and we may consider other options as well (especially since it conflicts with the idea of L2 being proposed here).

ahal commented 7 months ago

I think we need to step back and agree on what level means, and whether it's capable of satisfying all our definitions.

In my mind, levels are just a way to distinguish different degrees of trust. E.g, the L3 of one project could actually be less secure than the L1 of another. What's important is that within the project, there's a distinction between all the levels.

But it sounds like to @hwine level implies a certain threshold of security best practices.

I can see why there is value to both these definitions, but I'm not sure we can re-use the same nomenclature here. Maybe we can invent some new terminology for this (e.g security-rating?)

ahal commented 7 months ago

Or split it into two: trust_level and security_level.

Heh, this is complicated

hwine commented 7 months ago

But it sounds like to @hwine level implies a certain threshold of security best practices.

This is why we need some clarification. In my head, the whole "level" concept evolved from CVS's "commit level", which was a cross-project standard for how operations would be handled.

I agree that there are also varying impacts of operations within a project. Which does lead to a matrix of environments that a CI system may want to provide.

Maybe one way to frame our discussion is that we're looking at minimally-confusing name for the index into that matrix (please, let's keep it 2-dimensional :smile:).

bhearsum commented 5 months ago

I had a reread of this issue and https://github.com/mozilla-releng/releng-rfcs/pull/45 and I think we're all closer in agreement than we might've thought. The main thing that there's a difference of opinion on is what a "level" implies, but I'm not sure there's any disagreement on any the implications?

First, there are actually four levels of trust in our Github projects: Pull request from external contributor Pull request from collaborator (has at least triage permission) Push to unprotected branch (has at least write permission) Push to protected branch (has passed all branch protection rules or is admin)

This is a list of the ways we can group together a set of permissions/scopes, this is not the list of sets of permissions that we need to be able to grant, which I think is what @hwine might be looking for?

If we flip this on its head, and make a list of all of the restrictions and/or security properties we're talking about, it might help clarify things. Here's a mapping of what I think our current restrictions/levels are:

Restricted action (below) / level (right)	1	2	3
Push/ship things production delivery systems		✓	✓
Push/ship things to non-production delivery systems	✓
Sign with production certificates		✓	✓
Sign with non-production certificate	✓	✓	✓
Push a branch at the same level	✓	✓	✓
View live logs	✓	✓
Create interactive tasks	✓	✓
Ship executables to users, or build executables that will ship to users			✓

Note the main difference between L2 & L3 being the involvement of end user artifacts.

I think this table is largely congruent with @ahal's proposal at the top of this issue:

L1 is used for pull requests from external contributors L2 is used for both pull requests from collaborators and pushes to unprotected branches (this recognizes that both these cases come from a trusted source, but the changes are not necessarily reviewed / valid) L3 is used for pushes to protected branches

One quibble I would make is that I don't think that a branch being protected necessarily makes it L3 - branches could be protected for reasons other than being used to build end user artifacts. However, anything running on L3 workers must be from a protected branch (I wonder if we this is something verify as part of chain of trust, even?).

One case that I know will not be congruent with the existing levels and above table is https://github.com/mozilla/firefox-translations-training/issues/375 - where we are building things that will ship to users (granted they are language models, not binaries), but require live logs & interactive tasks. This may necessitate a 4th set of restricted actions. However, that 4th set would end up being the superset of what's current level 2 and 3...which doesn't sound terribly productive to me. I have a slightly different idea further down.

I can see why there is value to both these definitions, but I'm not sure we can re-use the same nomenclature here. Maybe we can invent some new terminology for this (e.g security-rating?)

I don't know if it will work out, but I think this is an excellent time to consider a nomenclature change, as it's not clear to me that there will be strict hierarchy (at least not an obvious one) between all sets of restrictions we come up with. It's worth noting that if we come up with a system that doesn't have a hierarchy, that may have implications for how cached tasks work - specifically whether or not we can re-use cached tasks across "levels" (or whatever we end up calling them).

With the above two things in mind, here is a new table, with slightly tweaked restricted actions and a purge of the "level" nomenclature:

Restricted action (below) / ??? (right)	?	??	???
Push/ship things production delivery systems		✓	✓
Push/ship things to non-production delivery systems	✓
Sign with production certificates		✓	✓
Sign with non-production certificate	✓	✓	✓
Push a branch at the same level	✓	✓	✓
View live logs	✓	✓
Create interactive tasks	✓	✓
Build non-executable things that will ship to users		✓
Ship executables to users, or build executables that will ship to users			✓

(I'm not sure "executables" is the right word here - what I'm trying to spell out it that there's a difference between shipping executable code and/or interpreters to users and shipping other things. Examples of the latter are things like the translations language models, but could conceivably encompass other things as well.)

I don't have any great ideas as a replacement for "level", but I'm also not as convinced that we need to replace it after reviewing all of this (but still open to it).

mozilla-releng / releng-rfcs

Github Trust Levels #46