Clarify how SLSA interacts with self-hosted runners #966

Closed MarkLodato closed 11 months ago

The SLSA specification could use more explicit guidance on how to handle self-hosted runners.

Problem

Many central build platforms, such as GitLab CI and GitHub Actions, have two options for where the build executes:

platform-hosted: the central build platform provides the build environment
self-hosted: the tenant provides its own build environment

This is usually not a separate "external parameter" but rather it is a field inside the build configuration.

People have concerns that (a) a self-hosted runner might be less secure than a platform-hosted runner, such that it does not guarantee the isolation requirements of SLSA Build L3; and/or (b) a specific self-hosted running might be more secure and thus required by the customer.

Related: #362, possibly others? /cc @samwhite-gl @arewm

Status quo

Technically we already have a recommendation in the spec, but it's weak and obscure, requiring a bit of a mental leap to realize:

In provenance (source):

The id MUST reflect the trust base that consumers care about. How detailed to be is a judgement call. For example, GitHub Actions supports both GitHub-hosted runners and self-hosted runners. The GitHub-hosted runner might be a single identity because it's all GitHub from the consumer's perspective. Meanwhile, each self-hosted runner might have its own identity because not all runners are trusted by all consumers.

And then in Verifying artifacts (source):

Look up the SLSA Build Level in the roots of trust, using the recognized public keys and the builder.id, defaulting to SLSA Build L1.

Combined, this means that the builder ought to embed the runner's identity in the builder.id and then the consumer ought to choose which specific runners they trust (or ignore the runner portion of the ID, if they don't care) in their verification policy. But this clearly isn't coming across, and it's only a weak recommendation.

To my knowledge, this is already what GitLab CI does. I'm not sure about GitHub Actions or other build platforms.

My thoughts

I (@MarkLodato) think we should:

Strengthen the recommendation in provenance.md so that it's a SHOULD and is more visible (not sure how).
Call out self-hosted runners in verifying-artifacts.md while talking about builder.id.
Call out self-hosted runners in requirements.md, though I'm not sure exactly how.

A counterargument is that a self-hosted runner is just one type of remote execution, and the user's build script could do arbitrary remote execution that is invisible to the build platform. Why are we calling out self-hosted runners explicitly? I am sympathetic to this argument, but here it seems like a win to at least partially address this. Self-hosted runners are an obvious thing that people have questions about and we can address. We can address other remote execution down the line.

I agree with the rational for providing more guidance about self-hosted runners and think the proposals here sound good.

Thanks for creating the issue @MarkLodato!

People have concerns that (a) a self-hosted runner might be less secure than a platform-hosted runner, such that it does not guarantee the isolation requirements of SLSA Build L3; and/or (b) a specific self-hosted running might be more secure and thus required by the customer.

Please forgive me my lack of deep understanding of SLSA inner workings and requirements for various levels, as I've not been working a lot with it in the past. I just wonder if this problem revolves around the identity of who certifies that the build environment used during the build was actually trusted.

If it is a customer's self-managed runner, it could not be appropriate for the platform (that has no control over the build process in there) to certify that the build environment was trusted, hence using the platform-provided identity to sign build artifacts may provide a false sense of security if builder.id is being set by a self-managed runner.

From what I understand (and I may be completely wrong here), the builder.id is being written to a provenance by a runner itself. If the runner gets compromised / tampered with, the platform-provided identity can sign a forged attestation. Would it even be possible that a self-managed runner gets modified in a way that it generates an attestation claiming that the build has been done on a platform-managed runner?

I'm sorry in advance if my understanding of how it works is incorrect! :pray:

A counterargument is that a self-hosted runner is just one type of remote execution, and the user's build script could do arbitrary remote execution that is invisible to the build platform. Why are we calling out self-hosted runners explicitly? I am sympathetic to this argument, but here it seems like a win to at least partially address this. Self-hosted runners are an obvious thing that people have questions about and we can address. We can address other remote execution down the line.

This was the line of thinking when we were writing up the requirements:

There are no sub-requirements on the build itself. Build L3 is limited to ensuring that a well-intentioned build runs securely. It does not require that a build platform prevents a producer from performing a risky or insecure build. In particular, the "Isolated" requirement does not prohibit a build from calling out to a remote execution service or a "self-hosted runner" that is outside the trust boundary of the build platform. [ref]

Up to the point of SLSA Build L3, we are concerned about having complete, accurate, and authentic provenance (except for potentially the completeness of resolved dependencies). At Build L3, we are only specifying requirements for the build platform and not the build action based on the requirement that the platform is generating the provenance.

How is the provenance actually generated in the self-hosted runner situations? The requirements imposed on the runner itself would depend on who is generating the Provenance attestation. If the runner is generating the provenance then that runner would have to be SLSA Build conformant. If the provenance is generated from outside of the runner (i.e. the recent blog bring-your-own-builder), then that process would need to conform.

Therefore, in my mind, the counter argument continues to sound accurate as long as the runner fits the case of the argument -- that it is just a method for remote execution of the build and not one for remote generation of the provenance.

@grzesiek and @arewm that's a good point about who generates the provenance. Maybe that's what it hinges on. I imagine there are three cases:

The platform generates the provenance and just calls the runner for individual work items. Both the platform and the runner are in the trust base (the runner can generate bad output), but only the platform can influence the provenance. (This is what I had in mind initially.)
The runner generates the provenance. The platform is untrusted and has no influence over the provenance. In this case, the runner is actually the "builder" and the platform is irrelevant.
- Variant: Same, but the platform generates some "external parameters" that are fed into the runner. In this case, the platform is relevant, but only for some fields in the provenance.
The platform provides the runner some sort of credential for generating provenance. In this case, both the platform and the runner can influence the provenance, and you have to consider the whole thing as one big platform. I hadn't considered this initially, so maybe this one should be treated differently?

The platform generates the provenance and just calls the runner for individual work items. Both the platform and the runner are in the trust base (the runner can generate bad output), but only the platform can influence the provenance. (This is what I had in mind initially.)

This is what you had in mind initially when we wrote the specification or when creating this issue?

The platform provides the runner some sort of credential for generating provenance. In this case, both the platform and the runner can influence the provenance, and you have to consider the whole thing as one big platform. I hadn't considered this initially, so maybe this one should be treated differently?

Can we treat it the same by ensuring that we indicate any system (platform and/or remote build environment) which is responsible for informing the provenance must be upheld to the requirements (i.e. secure key storage, isolation, etc.)?

It is a different case in implementation, but the specifics around an implementation should be able to taken into account during an audit/conformance process. If these parts are controlled by different entities, then they would both be in the transitive closure of the build platform.

This is what you had in mind initially when we wrote the specification or when creating this issue?

When creating the issue.

Can we treat it the same by ensuring that we indicate any system (platform and/or remote build environment) which is responsible for informing the provenance must be upheld to the requirements (i.e. secure key storage, isolation, etc.)?

Yes, agreed. I should have said "called out specifically".

I think the main issue with self-hosted runners is that it's difficult for readers to make the mental leap between the spec as written and the details of a real-world CI/CD system with self-hosted runners. Doing so requires a solid grasp of both SLSA and the technical security details of how the CI/CD system is implemented, and even then requires some careful thinking and analysis. My hope is that we can expand the text in the spec to offer more guidelines for these common situations so that readers can understand more quickly without having to do so much analysis. Does that make sense? Do you agree?

That makes sense, thanks. I do agree that readers might be challenged with a large mental leap requiring detailed knowledge of multiple aspects.

I am trying to figure out what the best place to record this type of clarification might be. There are two options that I see off hand:

Putting the clarification into the requirements text
- We have already established this pattern by mentioning the self-hosted runner in the requirement. Further clarification in the requirements might lead to large increases in the text body, distracting readers from the core requirement specifications.
- Putting the content in the requirements will enable us to ensure that the specification as written is clearly worded and relatable to real-world CICD systems.
Putting the clarification into the FAQ
- This type of topic feels like it is what an FAQ section is designed for because it is a frequently asked question. If we start to fill out the FAQs with clarifications like these, we should investigate restructuring it in order to simplify discovery in the FAQ document itself (and we can link to specific sections from places in the specification). While the FAQ is versioned so the changes to it can be tailored to specific SLSA specifications, having versioned clarification can lead to a harder discovery unless we make a concerted effort to contribute a clarification to all relevant versioned FAQ pages.
- Putting the content in the FAQ will enable us to inform readers about how to apply the specification to certain common real-world CICD systems.

I have a slight preference to the latter -- refine the structure of the FAQs to start using it as a way to inform applying the SLSA specification -- so that we can keep the specification itself from growing too large. This will also enable us to use a more long-form style for clarification.

Yeah, I agree that both seem viable and also lean towards the FAQ.

I proposed a change to the FAQs: https://github.com/slsa-framework/slsa/pull/989

slsa-framework / slsa