Closed haydentherapper closed 1 year ago
Another useful data point, Tekton (which is also drives other tools like JenkinsX), which would be limited by the Kubernetes JWT workload claims. (I assume this would also affect prow and other Kubernetes based CI as well)
job_workflow_ref may be unnecessary if we can construct it from other claim values like workflow and ref.
I don't think that's correct. In your link The workflow, ref, and other attributes describe the caller workflow, while job_workflow_ref refers to the called workflow:
These are different, and we rely on them for SLSA 3 builders to demonstrate the identity of the trusted builder, the called workflow, which is distinct from the caller.
There was conversation in https://github.com/sigstore/fulcio/issues/624 about including the run ID (run_id), run count (run_number) and attempt count (run_attempt). We should decide if these should be required for Fulcio certificates.
I'm actually on the side that it should not: these values can easily be added inside a signed attestation -- this is very much like recreating provenance inside a signing certificate. See https://github.com/slsa-framework/slsa/issues/464 related issue. The signing cert contains enough builder information that "You could think of the x509 builder as a first-stage builder, which is limited but sets the "root of trust". " We definitely don't NEED to include all the provenance inside the certificate. @laurentsimon
Another useful claim may be actor, who triggered the CI run.
Again, I think this starts turning into provenance metadata.
At minimum the signing cert should contain just the necessary info to identify the workflow: including the caller and called workflow and its commit SHA.
For example, user IDs should be used instead of usernames, and repository IDs should be used instead of repository names, to prevent resurrection attacks.
BIG +1! GitHub does expose repository IDs. Although: think of the verification side: it is much harder for humans to verify the cert fields to see if a signature came from a repository, when it is a repository ID. Again, that can go in provenance info (EDIT: maybe not? since the repository might be an unutrusted resurrected one)
These are different, and we rely on them for SLSA 3 builders to demonstrate the identity of the trusted builder, the called workflow, which is distinct from the caller.
Thanks for noting this, I've removed this from the issue description so it's now required.
I'm actually on the side that it should not:
I am in agreement, I believe Laurent as well from the discussion.
think of the verification side: it is much harder for humans to verify the cert fields to see if a signature came from a repository
I think we should be building verification policies around IDs and not human-readable values. I do agree it's harder for a human to validate it though, but I think this can be solved with better UX.
Related to the job_workflow_ref
and workflow
claims in token issued by GitHub Actions . . .
The job_workflow_ref
claim provides the full path to the called workflow:
slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/heads/main
Whereas the workflow
claim only identifies the logical name of the calling workflow:
build
This name isn't guaranteed to be unique across the workflows defined in a particular repo so this isn't particularly useful in identifying the calling workflow.
I think we should be building verification policies around IDs and not human-readable values. I do agree it's harder for a human to validate it though, but I think this can be solved with better UX.
Maybe a broader question is whether you see Sigstore as a building block / "root of trust" for "richer" systems (re-usable workflows and similar systems) or not. If you consider Sigstore the fundamental building block / enabler / RoT, then you may not need to keep adding more fields in the OIDC token / cert.
+1 on agreeing on the set of minimum claims, like workflow(s), ref(s) and repo. I don't know enough about other non-GitHub CIs to say anything about actors and other pieces. On GitHub it's part of the GH context and can be retrieved by the "richer" builder, so is not necessarily needed. Not sure about other CIs
If you consider Sigstore a standalone solution for sighing only (without richer trusted builders), then additional fields may be something to consider. Maybe it's a product decision...? Certificates are not the most human-friendly to work with, so it may limit the usability of a solution; whereas adding fields into a JSON-formatted provenance seems easier? Trusted builders can more easily incorporate changes over time (e.g., additional fields, new features), so maybe something to keep in mind as well.
At this point, I think we're agreement on the following:
The top priority is: we should require enough information to uniquely identify the workflow that was run.
That is (provided nothing has been deleted), I would be able to go fetch from the CI system enough complete information to understand what the workflow did, and consequently what any artifacts that validate against this certificate are.
Ideally, I can even be convinced about what workflow ran without needing a round-trip to the CI system.
In order to uniquely identify the workflow run, we need immutable identifiers for everything.
We may or may not want additional information, including human-readable versions of IDs or identifying the runner.
I'd bet many of these have easy answers and I just don't have enough context/background to know them.
job_workflow_ref
immutable identifiers@bdehamer, the example of a job_workflow_ref
you gave seems to support organization names, repo names, and tags/branches, all of which can change over time. Is there a way to get the analog of that job_workflow_ref
with organization IDs, repo IDs, and digests?
I think in most cases, we're trying to check the claim "this artifact was produced by workflow X from repository Y at commit Z," where typically X is somehow a trusted workflow.
@asraa and @laurentsimon's blog post about SLSA 3 on GitHub Actions is really helpful for understanding how these things will be used.
In general, I think we should have sophisticated tools for verification; manually checking all these things makes my head spin, but one-time engineering work to convince me "these packages were built from their source on GitHub using a standard Javascript builder" feels reasonable. I guess I'm in the @haydentherapper camp, or the "building block / root of trust" camp). I'd need to see use cases for the "signing only" camp that involve GitHub OIDC workflow identities.
Candidates:
Arguments for:
Arguments against:
My take: I think I'm convinced by @asraa and @laurentsimon here: minimal is better, and all of this is available elsewhere. Maybe with an exception for "what runner did this use?" since that seems like it could invalidate the provenance.
Maybe (see GitHub OIDC docs):
repository_id
: uniquely identifies a repository (do we need repository_owner_id
too? can you fetch a repository based only on its ID?)sha
: with repository_id
, this uniquely identifies a source treeworkflow
: upthread, it's pointed out that this may not be unique within a repo; what should we use instead?job_workflow_ref
(but with IDs): to identify a called reusable workflowI worry that the "reusable workflow" concept is a little GitHub-specific, and we might want one field that combines workflow
and job_workflow_ref
.
Need to do some homework here.
I'd argue that any verifier needs to understand each CI system in order to property understand how to interpret its repository IDs, workflow identifiers, etc., so my answer is "not very important."
It's definitely "nice" to have similar concepts represented by the same fields. However, it could be dangerous: maybe there are different types of hashes used by different systems, and you could confused them? I think most scenarios in which this is a problem are pretty contrived.
If so, we should probably request those ASAP.
It feels nice to be able to have "standard" provenance fields that maintainers can easily incorporate but verifiers can still trust. It's a little out-of-scope for this issue, but I think if we have a satisfying solution here that means that it's pretty hard to argue for more than a "minimal" claim set. (That said, we may be able to build a solution for some CI systems but not others.)
https://github.com/slsa-framework/slsa-github-generator gets so close in my mind. The missing element is composability: right now, I can:
Use a special, build-process-specific combined SLSA provenance generator/builder (e.g, the Go builder).
This tightly couples the provenance generator and builder in a way I don't like: if I need an update to my builder workflow, this could affect provenance generation, and any changes that could affect provenance generation are very high-consequence. An attack could turn a routine update (that, if it only affected the builder, could at worst lead to bad artifacts) into a break-everything forgery (I can provide provenance with arbitrary data).
Plus, now there's one provenance generator for each ecosystem that needs to be audited. Even if we're reusing components, there's not an easy way to check "my provenance came from a trusted generator, even if the builder is bad" without going into the source of the builder and parsing the workflow.
Use a provenance-only generator.
I much prefer this from a security standpoint. However, anytime I mess with the calling workflow, I have the ability to change what artifacts are provided.
I would really like to say "use standard provenance generator @ commit X and standard node.js builder @ commit Y, together". Then, I would know that no matter what the source of the calling repository was, my artifact came from node build
on that source.
I think it'll be much easier to decide what fields to use once we've answered the above questions for several candidate CI systems. Then, I think we should proceed with a minimal set of claims, and let users come to us with use cases that they don't meet.
Much of the above is a little bit off-topic, and I'm happy to table any discussions about those and pick them up elsewhere/later.
job_workflow_ref
immutable identifiers@bdehamer, the example of a
job_workflow_ref
you gave seems to support organization names, repo names, and tags/branches, all of which can change over time. Is there a way to get the analog of thatjob_workflow_ref
with organization IDs, repo IDs, and digests?
+1 on having them, and asking GH to support it including for re-usable workflows if it's not available yet.
Do we need to identify anything beyond the workflow?
Candidates:
- the run (see Add Github workflow run information to the signing certificate #624)
- the runner (is it self-hosted? do non-GitHub runners even get access to OIDC? it seems like maybe they do)
self-hosted runners have access to OIDC. You need a round-trip to verify this unless it's added into OIDC token (we asked GH to do that, so it may happen in the future). One additional complexity is that it's possible for a workflow to declare jobs self-hosted and others not. Note: the trusted builder can hardcode it (we do that in our builders).
https://github.com/slsa-framework/slsa-github-generator gets so close in my mind. The missing element is composability: right now, I can:
- Use a special, build-process-specific combined SLSA provenance generator/builder (e.g, the Go builder). This tightly couples the provenance generator and builder in a way I don't like: if I need an update to my builder workflow, this could affect provenance generation, and any changes that could affect provenance generation are very high-consequence. An attack could turn a routine update (that, if it only affected the builder, could at worst lead to bad artifacts) into a break-everything forgery (I can provide provenance with arbitrary data).
I don't entirely follow. At least in our case, the build and the provenance generation are separate jobs. The format remains the same, and only the buildConfig
/ builder.id
change across builders. Agreed that if the code that's responsible for populating the buildConfig
can be hijacked, it could forge the steps. But this code is part of the TCB, IIUC.
Maybe you're proposing having a dedicated project for provenance generation only? We kinda of have this in the generator repo. We don't expose it and only use it internally, though. We could, in theory, expose it thru a GitHub action.
Let me know if I mis-understood the comment.
Plus, now there's one provenance generator for each ecosystem that needs to be audited. Even if we're reusing components, there's not an easy way to check "my provenance came from a trusted generator, even if the builder is bad" without going into the source of the builder and parsing the workflow.
I think the plan is to share the provenance generation code with other builders for a given CI. On GitHub, we could theoretically create an Action for this. /cc @ianlewis
self-hosted runners have access to OIDC. You need a round-trip to verify this unless it's added into OIDC token (we asked GH to do that, so it may happen in the future). One additional complexity is that it's possible for a workflow to declare jobs self-hosted and others not. Note: the trusted builder can hardcode it (we do that in our builders).
TY! That helps.
Maybe you're proposing having a dedicated project for provenance generation only? We kinda of have this in the generator repo. We don't expose it and only use it internally, though. We could, in theory, expose it thru a GitHub action.
Let's move this conversation over to https://github.com/slsa-framework/slsa-github-generator/issues/763; apologies for the distraction from the root issue in this thread 😄
Is there a way to get the analog of that
job_workflow_ref
with organization IDs, repo IDs, and digests?
I don't know, but I'll try and track down the team here responsible for this stuff and make some inquiries.
can you fetch a repository based only on its ID
Yeah, there's a GET /repositories/:id
endpoint that will look-up a repo based solely on its ID (and the ID persists across renames and ownership changes)
I'd like to jumpstart some movement on this issue if possible, as we're regarding it pretty important for our work on npm attestations, especially now that we have begun to reach out to some potential launch partners (read: cloud CI vendors with existing OIDC support) to talk about integration on their own platforms.
Additionally we have some commitments from the Actions team to extend the OIDC token with the types of fields discussed in this thread (though we may need to get some further alignment there). If we can get crisp on some non-GitHub nomenclature for the cert fields, I feel like we're a long way toward settling this. Is anyone taking a stab at some generic naming notions? Should we try to chat in Sigstore Slack about a plan for settling this into a PR?
Let's get a chat going either on Slack or here, there hasn't been any progress.
Chiming in to describe some updates after we've had some conversations. I think some of this echos @znewman01 discussion earlier.
We MUST have the certificate to identify (with immutable references) the smallest "trust domain" relevant for client verification. So for GitHub we MUST have:
Stuff I think we can punt:
Stuff I'm not sure of:
If we do something like we MUST have the reusable workflow immutable ref AND the caller immutable ref, then this lines up with the patter for BuildKite https://github.com/sigstore/fulcio/pull/890 where the reusable workflow is the job_id and the caller immutable ref becomes the organization/pipeline slug. @sj26
@asraa the GH Actions team have just added some new claims to the ID token:
job_workflow_sha
: sha of the reusable workflow if one is used, otherwise will be the sha of the parent/triggering workflow, which can be from a different branch to the source repo/materials (is this version
in the draft slsa v1 spec?)workflow_ref
: Similar to job_workflow_ref
but always points to the trigger workflow path (aka "entryPoint"), instead of the reusable workflow if one is used. This should replace use of the workflow
claim that just points to the name of the triggering workflow.workflow_sha
: Similar to job_workflow_sha
but always points to the triggering workflow SHA (so could maybe be attached to entryPoint
to make this reference immutable)We MUST have the certificate to identify (with immutable references) the smallest "trust domain" relevant for client verification.
This makes sense for trusted builders, which is the north star. I wanted to raise a use-case for npm where it might take a very long time for us to effectively roll out trusted builders in the npm ecosystem given the varied nature of publish workflows in the wild. The majority of existing automated npm publish workflows I've investigated would be hard to support for a trusted builder without a lot of different runtimes and config options.
Until we get to a place where most projects end up using trusted builders, we could definitely use more information in the Fulcio cert to be able to validate that key pieces of the provenance statement have not been falsified.
This might be a bit of a anti-pattern given the preference for trusted builders to solve this problem. But if we had the repo URL, commit SHA, triggering workflow path, SHA and/or re-usable workflow path, SHA we could compare these values in the Fulcio cert against what's in the provenance statement before accepting the package for publishing.
Ideally we could access the following GitHub OIDC claims in the Fulcio cert:
job_workflow_ref
job_workflow_sha
(could this be combined with above in the SAN like ${job_workflow_ref}#${job_workflow_sha}
?)workflow_ref
workflow_sha
(could this be combined with workflow_ref
like: ${workflow_ref}#${workflow_sha}
)?${repo}@${ref}#${sha}
sha
ref
repo
Another thought, would it make sense to adopt "SLSA" naming for these attributes in the signing cert?
EntryPointURI
: workflow_ref
EntryPointDigest
: workflow_sha
ConfigSourceURI
: job_workflow_ref
ConfigSourceDigest
: job_workflow_sha
SourceURI
: repo@ref
SourceDigest
: sha
InvocationId
: ${run_number}-${run_attempt}
It might seem redundant to include workflow_sha
and sha
as in GitHub's case they are almost always identical, but there's at least one case where they are not the same when using the pull_request_target
event.
@asraa the GH Actions team have just added some new claims to the ID token:
job_workflow_sha
: sha of the reusable workflow if one is used, otherwise will be the sha of the parent/triggering workflow, which can be from a different branch to the source repo/materials (is thisversion
in the draft slsa v1 spec?)
This information should always be present for both the job_workflow and the triggering workflow, even if the caller refers to it by tag / branch. The GitHub context (not OIDC) provides this information for the repository. It's as important that the OIDC token provide this for the workflow / builder as well: sha, ref, ref_type should always be present.
Another thought, would it make sense to adopt "SLSA" naming for these attributes in the signing cert?
EntryPointURI
:workflow_ref
EntryPointDigest
:workflow_sha
ConfigSourceURI
:job_workflow_ref
ConfigSourceDigest
:job_workflow_sha
SourceURI
:repo@ref
SourceDigest
:sha
- Nice to have:
InvocationId
:${run_number}-${run_attempt}
Let's think carefully about making the OIDC format dependent on the SLSA (evolving) specs. In v1.0, for example, entryPoint
no longer exists. In general, if we only care about the identity, either job_workflow_*
or workflow_*
information are really needed to be part of the OIDC claims. I am not sure the distinction actually matters between the two. In the case of a workflow, the GitHub runner attests (thru OIDC) that it runs a workflow:job, which is the identity. In the case of a reusable workflow, the runner attests again to a (re-)workflow. A single claim could take care of this, since a reusable workflow has a different path than a (traditional / triggering) workflow - so the verifier can infer it. To be more generic, you could have a running-identity-name
and an running-identity-type
claim: this may be more generic and allows for other identity providers to express their identity more flexibly. The reusable workflow can get the triggering information from the GH context. Otherwise, there is no argument that 2 identities is the right number (workflow and job_workflow) and someone may want to have the complete list of nested reusable workflows (4 can be called from one another)
I have another claim to suggest, unrelated to the (great) conversation above about build instructions and references.
Some CI/CD providers allow you to either run your build on their cloud-hosted infrastructure, or let the customer host their own runner infrastructure. In the npm registry, we want to differentiate between builds that ran on cloud-hosted or customer-hosted infrastructure. We think it makes sense to include this claim alongside the other information being securely communicated from the CI/CD system to Fulcio (and then downstream to npm and other package managers).
runner information makes sense to include, I think, since it's part of the running-identity
and identifies the trust boundary.
Hiya! I'm Sam from Buildkite. We're introducing OIDC tokens, and I'm keen to see if we can enable usage of cosign for signing and verifying provenance of containers produced by CI/CD builds.
We include these already:
aud
(can be set to sigstore
)sub
(various attributes composed together identifying the pipeline and some build inputs)iss
(https://agent.buildkite.com
, because tokens are issued by our agent api to agents)exp
iat
nbf
We include some equivalents to these:
sha
is build_commit
, but may be a user-supplied value for manually triggered builds, or HEAD
for a new build until resolvedref
is a combination of build_branch
and build_tag
We do not include these:
job_workflow_ref
— the closest might be a reference to the containing pipeline, like https://buildkite.com/buildkite/lifecycled, everything else is dynamicevent_name
— no current equivalent, although we do record whether a build was started manually, by webhook, etcrepository
, workflow
— these are roughly the same thing for us, each org (account) has many pipelines which contain many builds (or "runs" in GHA parlance), e.g. https://buildkite.com/buildkite/lifecycledWe add these, which I think are important:
job_id
- a unique id for a particular task within a build run as a concrete process somewhere, GitHub also uses job id I think.agent_id
- a unique id for the persistent environment in which many jobs may be runThe job_id
feels particularly important for provenance, and because as much as we'd like CI/CD to be a pure function of few inputs it's actually a complicated mess of context with network access which can only be completely captured by a reference to the actual task and environment (the job and agent in our case).
In terms of things a user might like to verify, I expect the most would be the pipeline (or workflow) which produced an image, and the source branch or tag (ref) which was used. These feel like good generic attributes.
"Git Ref" and "Git Commit" for example could be good generic names for the current GitHub attributes "sha" and "ref". "Git Repository" also feels like a good generic attribute, although I would suggest it be a URI to be useful across CI providers instead of a simple org/repo
reference.
I don't know a good generic name for pipelines or workflows, the container of many runs of a particular ci/cd workflow, but it's closest to job_workflow_ref
. Every provider uses different terminology. In GitHub it's a combo of the repository, and a workflow file location at a ref. In GitLab it's the repository's CI/CD pipelines section, they use "pipeline" to mean one invocation of CI/CD in a repository, which we call a "build" and github calls a "run". AWS CodePipeline uses "pipeline" to mean the container of all invocations, and "pipeline execution" to mean one run of a pipeline with many pieces. GitHub has multiple workflows per repository, gitlab uses repository directly as the ci/cd container, and buildkite and aws codepipeline pipelines live outside and separately from the repository and the repository can change over time, so the repository alone is also not quite right. I would choose "Pipeline URI", but perhaps that's my bias for our domain language.
There is no standard for CI/CD provider OIDC tokens to my knowledge, and I'm not aware of any drive to standardise at the moment. The domain models vary significantly, too. I suspect normalizing the claims into useful attributes for verifying will need to remain in this fulcio for now. But perhaps there are some common attributes which will emerge and influence the claims generated in future, like GitHub's tokens.
If I had to pick a set of common attributes which would be useful in sigstore right now, it'd be roughly:
That's a whole bunch of thoughts and opinions, I'm not sure how much of it is useful, but hopefully a bit 🙏
Hiya! I'm Sam from Buildkite. We're introducing OIDC tokens, and I'm keen to see if we can enable usage of cosign for signing and verifying provenance of containers produced by CI/CD builds.
We include these already:
aud
(can be set tosigstore
)sub
(various attributes composed together identifying the pipeline and some build inputs)
Do you have an example of what this looks like? I'm curious why you need the build inputs to be part of the token. If your builder can be identified using sub
, could the builder create the attestation and store the inputs in it? (instead of packing everything in the certificate?) For interoperability, we're trying to use intoto as the provenance format.
buildkite and aws codepipeline pipelines live outside and separately from the repository and the repository can change over time
You mean the source of the builder, not the source being built, correct? Do you have a link / example?
If I had to pick a set of common attributes which would be useful in sigstore right now, it'd be roughly:
- Git Repository URI
- Git Ref
- Git Commit
I would add Git Ref Type, which indicates if the ref is a branch, tag, etc.
It may be useful to pack these fields into its own struct / field / x509 cert, and version it to allow for flexibility, like:
identity {
version: 1
<other-fields>
}
Fyi, I took a brief look at the SPIFFE ID (https://spiffe.io/docs/latest/spiffe-about/spiffe-concepts/), and they don't seem to have much more than whats proposed here (it's just spiffe:<trust-domain>/<path>
).
Do you have an example of what this looks like?
aud
defaults to https://buildkite.com/<org>
but is customisable.
sub
currently looks like:
organization:acme-in:pipeline:super-duper-app:ref:refs/heads/super-duper-feature:commit:abc123...:step:build
It's quite symbolic. That's because some consumers of OIDC tokens, i.e. AWS, only allow writing policies based on partial string matches against subjects. It also does not uniquely identify a piece of work. It's not ideal, but it's what is available.
I'm curious why you need the build inputs to be part of the token. If your builder can be identified using sub, could the builder create the attestation and store the inputs in it? (instead of packing everything in the certificate?) For interoperability, we're trying to use intoto as the provenance format.
The builder could create and store attestations, but most consumers of tokens want to make decisions without round trips back to the builder. And then how does one authenticate back the builder to ask for attestations? If you have a living identity token then maybe it makes sense to use that, but in a signature that token is gone.
Again, in the conversations I've been having, most folks want enough information baked into the tokens and/or signatures to make policy decisions without additional round trips or more external systems involved.
buildkite and aws codepipeline pipelines live outside and separately from the repository and the repository can change over time You mean the source of the builder, not the source being built, correct? Do you have a link / example?
Hm, yes I think so. Presuming a "builder" means the same thing as a "pipeline" to both Buildkite and CodePipeline, a builder is generally configured with inputs for new builds, and one of those inputs is the source repository. But the source repository can be changed between builds. So the source repository for two builds run by the same builder may not be the same.
I would add Git Ref Type, which indicates if the ref is a branch, tag, etc.
If Git Ref is fully qualified this is already included, no? i.e. refs/heads/some-branch
versus refs/tags/v1.2.3
Fyi, I took a brief look at the SPIFFE ID (https://spiffe.io/docs/latest/spiffe-about/spiffe-concepts/), and they don't seem to have much more than whats proposed here (it's just spiffe:
/ ).
Yeah, interesting. So that's almost pure identity without attributes, unless you understand the URI format for a particular trust domain. More complex policy decisions would need to start from the identity and consult other systems for more context. If you have control over the shape of the SPIFFE ID then it might be easier, but when using a hosted service where the format is dictated then its idea of the trust domain might vary from your own. For example, some workloads might care about branch, but some might not.
OIDC seem powerful in contrast because complex policy decisions can be made based on the identity token and the contained attributes including provenance information without requiring additional interactions. And these can be varied by consumers without much support from providers (including providers of hosted services).
So I guess it depends on what degree of provenance information sigstore would like to include for policy decisions without additional system dependencies or provider control.
So there's sort of two questions rolled into one here:
It's critical that Fulcio work across many cloud CI/CD providers. But I'm not sure if we'll be able to get all providers to use the same field names. So to help answer question 1, I'm going to reference the current GitHub OIDC token field names for illustrative purposes, even though the goal of this issue (as I understand it!) is to ensure each cloud CI/CD provider is sending Fulcio the information in some field, which may (or may not) have a different name.
At any rate, here's an attempt to summarize where we're at so far.
First there's some standard OIDC fields:
Field Description | GitHub OIDC Field Name | Why / Notes / Questions |
---|---|---|
A user-customizable field set to “sigstore” | aud |
To ensure the OIDC tokens are being used for their intended service |
Who issued the token | iss |
So we know which platform the token is from |
Timestamp of when the token expires | exp |
So a token cannot be used well after it was provisioned |
Timestamp of when the token starts being valid | nbf |
To set a start time before which the token can be used |
Timestamp of when the token was issued | iat |
To audit when the token was issued |
Then we get into the build attestations:
Field Description | GitHub OIDC Field Name | GitHub OIDC Example Value | Why / Notes / Questions |
---|---|---|---|
Enough information to construct a URL to point to the source code | repository |
org/repo |
Should we ask providers to send a URL, instead of asking consumers to construct a URL from this field? |
An immutable reference (i.e. a commit SHA) to a specific version of the source code | sha |
01234... |
sub field might contain mutable references, and we want to know exactly what version we are using |
The source code branch / tag name, for enforcing policies like "all releases must come from a certain branch" | ref |
main |
This is useful for writing policies, but could potentially be confusing if it points to a mutable release branch / tag |
Enough information to construct a URL to point at the top-level / initiating build instructions | workflow_file_ref |
org/repo/.../builder.yml@...main |
Similar to repository , should we ask providers to send a URL, instead of asking consumers to construct a URL from this field? Note that GitHub currently includes branch information in this field, which Fulcio might choose to ignore. Note that this is more precise and would replace the field workflow . |
An immutable reference (i.e. a commit SHA) to a specific version of the top-level / initiating build instructions | workflow_file_sha |
01234... |
|
Enough information to construct a URL to point at low-level / specific build instructions that could be maintained by a neutral party like SLSA | job_workflow_ref |
slsa-framework/repo/.../reusable-builder.yml@...main |
Similar to repository , should we ask providers to send a URL, instead of asking consumers to construct a URL from this field? Note that GitHub currently includes branch information in this field, which Fulcio might choose to ignore. |
An immutable reference (i.e. a commit SHA) to a specific version of the low-level / specific build instructions | job_workflow_sha |
01234... |
|
To specify if a build took place in platform-hosted infrastructure or customer-hosted infrastructure | runner_environment |
TBD | To distinguish the security properties of a build system where the customer can influence the environment (or not). We suggest values like "platform-hosted" or "self-hosted" for now, which could be extended in the future. |
Was a build triggered by a human or an automatic process? | event_name |
workflow_dispatch |
This is part of the Fulcio certificate today, but is this something we care about? What are some good platform-neutral values for this field? |
👋 I opened a draft PR: https://github.com/sigstore/fulcio/pull/945 - attempting to standardize on the Fulcio cert extensions where these claims would end up. Let me know if this would be better suited in a new issue before starting on a PR but seemed easier to collaborate on an actual file.
I would love to see standardized claims in CI provider tokens, but is it reasonable for us to expect providers to actually try to become conformant with a standard created here? As an example, many CI providers failed to correctly implement the aud
claim in their tokens. Even with a clear security incentive to make that claim configurable, its taken a long time for many to fix the problem.
What would incentivize these various platforms to be compliant? What have they gained for their users if they do?
The incentive is ease of integration, and a template for a minimum set of claims to represent an identity. We've had many discussions across issues in this repo about what represents an identity vs what represents provenance. Standardizing on a set of claims makes it clear what we consider to be an identity. Additionally, if a CI provider wants to integrate with Fulcio and has implemented the set of claims, it'll be easy not just for the Fulcio integration in terms of the code that needs to be added, but also for all of the clients that need to verify sigstore-issued certificates. If every CI has its own set of claims/OIDs, it'll be difficult to write verification policies across sigstore clients.
I would love to see standardized claims in CI provider tokens
My thinking with https://github.com/sigstore/fulcio/pull/945 was to standardise on the Fulcio cert extensions that cover the identity. This would effectively standardise on a subset of required OIDC claims, but at the same time not require CI/CD providers to conform to the same claim attribute names. CI/CD specific mapping would still need to exist in Fulcio.
@feelepxyz +1 to standardizing the certificate extensions over the actual token claims. I feel like its quite a bit easier for providers to marshal / parse existing token claims into the right cert extensions with a small amount of logic in Fulcio itself instead of requiring changes to their token format
I don't think it's likely CI provides (us included) will change OIDC attributes to suit Sigstore, sorry. Those tokens have too many requirements on them already. But I reckon we'll be happy to provide the grunt to glue them together within sigstore/fulcio.
I'm pretty excited that #890 is close to merge. Beyond identifying which pipeline a binary comes from, we have customers asking for the ability to verify which git branch and commit a signed binary comes from, and which build and job (the specific run of a workflow) created a binary too. For example, being able to verify that a binary was produced by an earlier job in the same build, or using the job identity to seek domain-specific attestations via an api. Very keen to see some generalised attributes added. I'm happy to write the plumbing for Buildkite once a direction has been decided.
Reopening during implementation.
Im starting implementation on this now.
We've got a certificate!
-----BEGIN CERTIFICATE-----
MIIGVzCCBf2gAwIBAgIUBC0AN21K0mDArYsvFMITxLAqhIMwCgYIKoZIzj0EAwIw
aDEMMAoGA1UEBhMDVVNBMQswCQYDVQQIEwJXQTERMA8GA1UEBxMIS2lya2xhbmQx
FTATBgNVBAkTDDc2NyA2dGggU3QgUzEOMAwGA1UEERMFOTgwMzMxETAPBgNVBAoT
CHNpZ3N0b3JlMB4XDTIzMDMxNTIyNDM0NFoXDTIzMDMxNTIyNTM0NFowADBZMBMG
ByqGSM49AgEGCCqGSM49AwEHA0IABN2JaEWm3pvFf5SNN6T/c9AV6GPEQYt+C+qK
67CnRSIJYpMJ6UoFMoaCOIhWlXjBTYqDtt4r85PnC4nJtLx0x+SjggTrMIIE5zAO
BgNVHQ8BAf8EBAMCB4AwEwYDVR0lBAwwCgYIKwYBBQUHAwMwHQYDVR0OBBYEFK17
43RedvZXbIiYWZb8W9oTwwPmMB8GA1UdIwQYMBaAFPlHHwJ/gqhBtZ0dAlWvBMDN
Tv3sMGwGA1UdEQEB/wRiMGCGXmh0dHBzOi8vZ2l0aHViLmNvbS9oYXlkZW50aGVy
YXBwZXIvdGVzdC1yZXBvc2l0b3J5Ly5naXRodWIvd29ya2Zsb3dzL3Rlc3QueWFt
bEByZWZzL2hlYWRzL21haW4wOQYKKwYBBAGDvzABAQQraHR0cHM6Ly90b2tlbi5h
Y3Rpb25zLmdpdGh1YnVzZXJjb250ZW50LmNvbTA7BgorBgEEAYO/MAEVBC0MK2h0
dHBzOi8vdG9rZW4uYWN0aW9ucy5naXRodWJ1c2VyY29udGVudC5jb20wHwYKKwYB
BAGDvzABAgQRd29ya2Zsb3dfZGlzcGF0Y2gwNgYKKwYBBAGDvzABAwQoNjE4ZjA3
NDUxMzM4NTExYTc5YTQ0NjEyYWU2YmM4NzYyMmUyZjZlYzASBgorBgEEAYO/MAEE
BARUZXN0MC0GCisGAQQBg78wAQUEH2hheWRlbnRoZXJhcHBlci90ZXN0LXJlcG9z
aXRvcnkwHQYKKwYBBAGDvzABBgQPcmVmcy9oZWFkcy9tYWluMEsGCisGAQQBg78w
AQgEPQw7aHR0cHM6Ly9naXRodWIuY29tLzYxOGYwNzQ1MTMzODUxMWE3OWE0NDYx
MmFlNmJjODc2MjJlMmY2ZWMwOAYKKwYBBAGDvzABCQQqDCg2MThmMDc0NTEzMzg1
MTFhNzlhNDQ2MTJhZTZiYzg3NjIyZTJmNmVjMB0GCisGAQQBg78wAQoEDwwNZ2l0
aHViLWhvc3RlZDBCBgorBgEEAYO/MAELBDQMMmh0dHBzOi8vZ2l0aHViLmNvbS9o
YXlkZW50aGVyYXBwZXIvdGVzdC1yZXBvc2l0b3J5MDgGCisGAQQBg78wAQwEKgwo
NjE4ZjA3NDUxMzM4NTExYTc5YTQ0NjEyYWU2YmM4NzYyMmUyZjZlYzAfBgorBgEE
AYO/MAENBBEMD3JlZnMvaGVhZHMvbWFpbjAZBgorBgEEAYO/MAEOBAsMCTYwNjIx
MDIxNzAyBgorBgEEAYO/MAEPBCQMImh0dHBzOi8vZ2l0aHViLmNvbS9oYXlkZW50
aGVyYXBwZXIwFwYKKwYBBAGDvzABEAQJDAc4NDE4NzYwMG4GCisGAQQBg78wAREE
YAxeaHR0cHM6Ly9naXRodWIuY29tL2hheWRlbnRoZXJhcHBlci90ZXN0LXJlcG9z
aXRvcnkvLmdpdGh1Yi93b3JrZmxvd3MvdGVzdC55YW1sQHJlZnMvaGVhZHMvbWFp
bjA4BgorBgEEAYO/MAESBCoMKDYxOGYwNzQ1MTMzODUxMWE3OWE0NDYxMmFlNmJj
ODc2MjJlMmY2ZWMwIQYKKwYBBAGDvzABEwQTDBF3b3JrZmxvd19kaXNwYXRjaDBl
BgorBgEEAYO/MAEUBFcMVWh0dHBzOi8vZ2l0aHViLmNvbS9oYXlkZW50aGVyYXBw
ZXIvdGVzdC1yZXBvc2l0b3J5L2FjdGlvbnMvcnVucy80NDMxNTU4NzExL2F0dGVt
cHRzLzIwCgYIKoZIzj0EAwIDSAAwRQIgERwyY9BWWEZMDy28nfvxf8QSYB0taVcD
Yk+81NhN7dICIQC7YFA90OXnmSorP+/ibHNlJX4/9Wo3euYbJC7QMtKr8A==
-----END CERTIFICATE-----
Which expands to:
$ openssl x509 -in cert.txt -noout -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
04:2d:00:37:6d:4a:d2:60:c0:ad:8b:2f:14:c2:13:c4:b0:2a:84:83
Signature Algorithm: ecdsa-with-SHA256
Issuer: C = USA, ST = WA, L = Kirkland, street = 767 6th St S, postalCode = 98033, O = sigstore
Validity
Not Before: Mar 15 22:43:44 2023 GMT
Not After : Mar 15 22:53:44 2023 GMT
Subject:
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:dd:89:68:45:a6:de:9b:c5:7f:94:8d:37:a4:ff:
73:d0:15:e8:63:c4:41:8b:7e:0b:ea:8a:eb:b0:a7:
45:22:09:62:93:09:e9:4a:05:32:86:82:38:88:56:
95:78:c1:4d:8a:83:b6:de:2b:f3:93:e7:0b:89:c9:
b4:bc:74:c7:e4
ASN1 OID: prime256v1
NIST CURVE: P-256
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature
X509v3 Extended Key Usage:
Code Signing
X509v3 Subject Key Identifier:
AD:7B:E3:74:5E:76:F6:57:6C:88:98:59:96:FC:5B:DA:13:C3:03:E6
X509v3 Authority Key Identifier:
F9:47:1F:02:7F:82:A8:41:B5:9D:1D:02:55:AF:04:C0:CD:4E:FD:EC
X509v3 Subject Alternative Name: critical
URI:https://github.com/haydentherapper/test-repository/.github/workflows/test.yaml@refs/heads/main
1.3.6.1.4.1.57264.1.1:
https://token.actions.githubusercontent.com
1.3.6.1.4.1.57264.1.21:
.+https://token.actions.githubusercontent.com
1.3.6.1.4.1.57264.1.2:
workflow_dispatch
1.3.6.1.4.1.57264.1.3:
618f07451338511a79a44612ae6bc87622e2f6ec
1.3.6.1.4.1.57264.1.4:
Test
1.3.6.1.4.1.57264.1.5:
haydentherapper/test-repository
1.3.6.1.4.1.57264.1.6:
refs/heads/main
1.3.6.1.4.1.57264.1.8:
.;https://github.com/618f07451338511a79a44612ae6bc87622e2f6ec
1.3.6.1.4.1.57264.1.9:
.(618f07451338511a79a44612ae6bc87622e2f6ec
1.3.6.1.4.1.57264.1.10:
github-hosted .
1.3.6.1.4.1.57264.1.11:
.2https://github.com/haydentherapper/test-repository
1.3.6.1.4.1.57264.1.12:
.(618f07451338511a79a44612ae6bc87622e2f6ec
1.3.6.1.4.1.57264.1.13:
..refs/heads/main
1.3.6.1.4.1.57264.1.14:
..606210217
1.3.6.1.4.1.57264.1.15:
."https://github.com/haydentherapper
1.3.6.1.4.1.57264.1.16:
..8418760
1.3.6.1.4.1.57264.1.17:
.^https://github.com/haydentherapper/test-repository/.github/workflows/test.yaml@refs/heads/main
1.3.6.1.4.1.57264.1.18:
.(618f07451338511a79a44612ae6bc87622e2f6ec
1.3.6.1.4.1.57264.1.19:
..workflow_dispatch
1.3.6.1.4.1.57264.1.20:
.Uhttps://github.com/haydentherapper/test-repository/actions/runs/4431558711/attempts/2
Signature Algorithm: ecdsa-with-SHA256
Signature Value:
30:45:02:20:11:1c:32:63:d0:56:58:46:4c:0f:2d:bc:9d:fb:
f1:7f:c4:12:60:1d:2d:69:57:03:62:4f:bc:d4:d8:4d:ed:d2:
02:21:00:bb:60:50:3d:d0:e5:e7:99:2a:2b:3f:ef:e2:6c:73:
65:25:7e:3f:f5:6a:37:7a:e6:1b:24:2e:d0:32:d2:ab:f0
Please double check the values match up to what's expected. Something to note is that the value for each new extension is now in line with what RFC5280 requires, a DER encoded string rather than the raw value[1]. This should hopefully mean that off-the-shelf certificate parsing libraries will have an easier time handling custom extensions.
Just cleaning up the code now and then I'll push up a PR with the changes.
[1] This was never brought up by the Golang clients because it's so easy to get the value of a custom certificate extension. The DER encoding adds two bytes, a tag for type (0x0C, meaning a UTF8String) and the length of the value. This change means clients will have to unmarshal the extension now. For Go, this looks like:
var issuerVal string
rest, err := asn1.Unmarshal(issuerExt.Value, &issuerVal)
Very easy still! Now we get the added benefit of being able to specify non-string extension values too.
@haydentherapper awesome! Thanks for taking this on 😍
1.3.6.1.4.1.57264.1.8
Looks like the job_workflow_sha
maybe ended up here instead of job_workflow_ref
as this should be the Build Signer URI
?
1.3.6.1.4.1.57264.1.10
Maybe just some rendering weirdness but what's up with the value showing up to the left of the period? Also, presuming the prefixes showing up in the above example are part of the encoding somehow? e.g. .^..h
, ."h
etc.
Everything else looks good to me!
1.3.6.1.4.1.57264.1.21: .+https://token.actions.githubusercontent.com
Is this encoding the the issuer as DER encoded string? Nit, but should the re-encoded issuer come before Build Signer URI
at 1.3.6.1.4.1.57264.1.8
instead of at the end, bumping all the other new ones down one?
Maybe just some rendering weirdness but what's up with the value showing up to the left of the period?
This is just openssl
not expecting the extension value to contain an encoded string. The stray characters are the DER-encoded tag and length for the UTF8String.
Old encoding vs new encoding:
The stray characters are the DER-encoded tag and length for the UTF8String.
Nice one 👍
Looks like the job_workflow_sha maybe ended up here instead of job_workflow_ref as this should be the Build Signer URI?
Good catch, fixed!
Is this encoding the the issuer as DER encoded string?
+1 to what Brian said. For example, for .;https://github.com/618f07451338511a79a44612ae6bc87622e2f6ec
, ;
is 0x3B = 59, the length of https://github.com/618f07451338511a79a44612ae6bc87622e2f6ec
. The first .
is just because openssl can't render 0x0C into ASCII.
1.3.6.1.4.1.57264.1.21
Yea, I can make that change to move this to .8
and bump all OIDs by 1.
Goal
Create a standard set of claims that should be present in OIDC tokens from CI systems such as GitHub Actions, Cirrus CI, GitLab, Circle CI, etc.
Background
As noted in the NPM RFC for integrating with Sigstore, and as documented in other tickets (https://github.com/sigstore/fulcio/issues/243, https://github.com/sigstore/fulcio/issues/591, https://github.com/sigstore/fulcio/issues/748), there is interest in support for other CI systems. It is technically possible to implement support for each, but it will require code duplication and work for onboarding every CI platform. It would be ideal if all OIDC tokens from all CI systems had a standard set of claims to represent identity, so that onboarding would simply be updating configuration.
Current state
All of the above platforms either are working on or currently produce OIDC tokens for CI workflows. Fulcio currently only accepts CI tokens from GitHub Actions, and has hardcoded the GitHub specific claim values and produces a code signing certificate with GitHub specific OID values.
Currently expected claims (GitHub ref)
job_workflow_ref
sha
event_name
repository
workflow
ref
aud
(which must be set tosigstore
)exp
sha
,event_name
,repository
,workflow
, andref
are included in issued certificates in custom OIDs - https://github.com/sigstore/fulcio/blob/main/docs/oid-info.md.Required claims
The token should include standard OIDC claims like:
aud
(which must be customizable and set tosigstore
)sub
iss
exp
iat
nbf
We should include the claims specified in "Currently expected claims".
There was conversation in https://github.com/sigstore/fulcio/issues/624 about including the run ID (
run_id
), run count (run_number
) and attempt count (run_attempt
). We should decide if these should be required for Fulcio certificates.Another useful claim may be
actor
, who triggered the CI run.Any claim values must be immutable. For example, user IDs should be used instead of usernames, and repository IDs should be used instead of repository names, to prevent resurrection attacks.
cc @asraa @laurentsimon @znewman01 @fkorotkov @feelepxyz, what would you like to see in a token and do you have recommendations on claim names?