RFC for signing/verifying remotely referenced taskcluster.yml files

bhearsum commented 11 months ago

This is an addendum to #182. I'll note that the contents of the RFC only cover verification, because that's the only part that Taskcluster the platform cares about.

In the Firefox CI cluster, I expect that we'll be signing these through Autograph (most likely via https://github.com/mozilla-releng/adhoc-signing at first), and copying the signatures into wherever we publish the .taskcluster.yml files.

lotas commented 10 months ago

We were discussing this with Pete.. :)

The biggest question so far was what problem are we trying to solve? Protect what and from whom?

Some extra ideas that popped up: use scopes

We can put github:allow-includes scope and make github repo roles include it. So if some repo needs this - you can just add scope. This way you can stay flexible and don't lock into deployment. Going further you can also add more control by adding scopes that would include allowed urls: github:allow-includes:github.com/releng/baseline, etc.. just a thought

bhearsum commented 10 months ago

The RFC doesn't cover how the service obtains the key(s) to validates the signatures.

The current draft has them specified in Taskcluster-Github's config.yml. I can see that it is perhaps not specific enough though, maybe that's what your referring to?

For a multitenant environment, I think it would be better for the repo to stipulate if it requires a signature, and which signing keys it accepts, rather than have a single global key that can be used for signing across the entire deployment, or a single set of keys that apply to all projects. This feels like it should be repo config, empowering the project users who the CI is for.

We're getting close to a point where what I believe the needs for Firefox CI are are close to incompatible with is wanted by Taskcluster in general. Specifically, I think we want all of the following for Firefox CI:

Repositories should not be able to opt out of signature checks if they are using a remotely referenced .taskcluster.yml
Some (possibly all) repositories should not be able to specify their own keys (I'm thinking of level 3 repositories here, where we are very strict about things that go into CI.)
Some (possibly all) repositories should only be allowed to pull remotely referenced .taskcluster.yml files from location specified by the Taskcluster-GitHub deployment. (This was a SecOps ask in the RRA.)

Many (all?) of these are quite at odds with what Taskcluster in general seems to want. I'm struggling to come up with a viable path forward here. I'm tempted to say that SecOps and the Taskcluster team needs to work together to come up with it - I feel that I'm largely acting as an intermediary here.

I think this design is much more flexible, more transparent, and puts the control in the hands of the projects that use it. My concern with the platform deployment approach is it assumes a taskcluster deployment is controlled by a central team, blocks project teams when those staff are not available, and does not support multi-tenant type environments. It is also more opaque, difficult to troubleshoot why the wrong signing key might be in use, more difficult to change the signing key(s) if they need updating (because hidden behind platform config and only visible to operational staff).

I think having it in the .taskcluster.yml makes each .taskcluster a little bit bigger, but the config in there is unlikely to change frequently, and if a key is rotated, it makes it much more visible, provides an auditing history, keeps a git history of the changes that occurred, and who made them, and allows you to roll out changes gradually if required, but with a script you can update all repos in one go if required. This supports the environment changing at mozilla too, if it stops being a single team that control all the CI pipelines of the whole company, and some teams need to move quickly but would like to adopt the same security approach. It is more flexible regarding changes to the organisation.

I understand what you're saying about flexibility, but we're not talking about something here that has no workarounds. If you want to include a .taskcluster.yml from a non-approved source, you would have two options: 1) Talk to RelEng and either that source added, or move the .taskcluster.yml to an already approved source. 2) Live without the remote reference, and copy in the contents.

There is no hard block stopping work here in any case - you can always do whatever you want in the .taskcluster.yml in a repo you control.

bhearsum commented 10 months ago

We were discussing this with Pete.. :)

The biggest question so far was what problem are we trying to solve? Protect what and from whom?

The goal is to ensure that the remote .taskcluster.yml that is processed was authored and published by a known good source. (To guard against man in the middle attacks, compromised GitHub accounts, etc.)

Some extra ideas that popped up: use scopes

We can put github:allow-includes scope and make github repo roles include it. So if some repo needs this - you can just add scope. This way you can stay flexible and don't lock into deployment. Going further you can also add more control by adding scopes that would include allowed urls: github:allow-includes:github.com/releng/baseline, etc.. just a thought

I'm not sure I fully understand this suggestion...are you saying that these scopes would control which repositories remotely referenced .taskcluster.yml files could come from? If so, that seems like a reasonable alternative to mapping project repositories to these repos in the Taskcluster-GitHub configuration. (It doesn't solve the integrity checking part of this - but it does address another thing that SecOps wanted.)

ahal commented 10 months ago

I think the disconnect here is stemming from the fact that the Taskcluster team are approaching this with the lens of developers as the target users and a "hacker ethos" (empower them as much as possible).

I think normally that's the right approach, but in this case our aim is to lock things down, the opposite of empowering them. Think of it from the lens of selling Taskcluster to an enterprise user and the request makes a lot more sense. Enterprise users (and fxci) need controls to prevent footguns and security oopsies. I think Taskcluster is best suited for large enterprises, so IMO it makes a ton of sense to build these controls directly into the platform.

That's not to say we need to enforce these controls on anyone. Every instance can be free to use or not use them as they see fit.

With that in mind, @petemoore is there any compelling reason not to specify the keys as a deployment configuration?

ahal commented 10 months ago

Also there's no reason they couldn't be configurable in both the deployment and the .taskcluster.yml if you wanted.. but I don't think fxci would use the .taskcluster.yml version, so would likely be a case of YAGNI.

taskcluster / taskcluster-rfcs

RFC for signing/verifying remotely referenced taskcluster.yml files #187