slsa-framework / slsa-github-generator

Language-agnostic SLSA provenance generation for Github Actions
Apache License 2.0
431 stars 128 forks source link

[feature] [byo] Record the GitHub context & event payload in SLSA provenance #1505

Open asraa opened 1 year ago

asraa commented 1 year ago

Describe the bug Currently the SLSA provenance only allows for string-string ParameterValues. The GitHub event payload is a JSON object.

While we may be able to flatten it, it's inconvenient.

Add if the SLSA v1.0 updates to allow objects or more complex JSON types.

asraa commented 1 year ago

Also relevant here is the context.

We currently record these fields https://github.com/slsa-framework/slsa-github-generator/blob/76f03fa7e30209f32ac76ce417ddc43ff98af42a/.github/actions/verify-token/src/predicate.ts#L158-L184 in SLSA v1.0 provenance (which will be reused for BYOB as well generically).

Currently, the human readable ACTOR and other fields that may have PII are not noted. If we put the full context, we would have to ensure or allow an opt-out model for the included paramters.

My suggestion here is to add, from github context and event context, all "allowlisted" safe and relevant fields (e.g. ensure that base_ref is added from event_payload). I would rather this than an opt-out model. For e.g.:

The context also includes whether the ref was protected. IMO that is important but I remember we have had discussions on whether this information is "public".

cc @laurentsimon @kommendorkapten

laurentsimon commented 1 year ago

Good point. There's a lot of value in recording those fields (actor_id, owner_id) as we explained in https://slsa.dev/blog/2022/06/slsa-github-workflows: it allows monitoring for changes for account / repo re-creation.

What if we could record opii = H(Nonce, _pii_) where Nonce is a 128-bit nonce / secret that only the builder knows? It would provide privacy but allow for linkability between two attestations, which would allow monitoring for changes.

We could further scope the obfuscated version by repo, with opii = H(Nonce, repo_name, _pii_)

laurentsimon commented 1 year ago

Sync'ed with @asraa today. The team agreed recording a non-human readable version field_id is acceptable, so let's ignore my proposal!

asraa commented 1 year ago

I've added the GH event payload here: https://github.com/slsa-framework/slsa-github-generator/pull/1611

I took a look at the reasoning in slsa-verifier for the necessary fields here.

We will never need the event.repository context (the necessary content is elsewhere, and this leaks owner info), so I think we can scrub that.

asraa commented 1 year ago

https://github.com/slsa-framework/slsa-github-generator/issues/1575#issuecomment-1409547881

For the rest of the work of masking sensitive info, see here