slsa-framework / slsa-github-generator

Language-agnostic SLSA provenance generation for Github Actions
Apache License 2.0
413 stars 127 forks source link

[feature] workflow for publishing source code archives as release assets #2951

Open junyer opened 10 months ago

junyer commented 10 months ago

Is your feature request related to a problem? Please describe. Bazel recommends publishing source code archives as release assets – and Bazel Central Registry verifies stability by checking for …/releases/download/… in GitHub URLs. Using gh release download and gh release upload, GitHub Actions can automate this trivially, but OpenSSF punishes projects whose release assets lack signature and provenance.

Describe the solution you'd like SLSA should provide a workflow for publishing source code archives as release assets with signature and provenance. Ideally, any project's release workflow could include a job specifying only permissions and uses keys and get .zip, .zip.intoto.jsonl, .tar.gz and .tar.gz.intoto.jsonl files attached to the release.

Describe alternatives you've considered Letting N different projects implement this themselves in approximately N different ways. ;)

Additional context N/A

laurentsimon commented 10 months ago

Hi, thanks for the issue. You can achieve this today using the generator https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/generic/README.md and setting the upload-asset: true. Can you confirm this works for your use case?

junyer commented 10 months ago

I'm confident that it would, but the idea here is to spare every project the sha256sum … | base64 -w0 and base64-subjects dance and, moreover, to ensure that the gh release download and gh release upload dance is done by a trusted reusable workflow. It is – or it ought to be! – a common operation with approximately zero room for variations, so it should be made as convenient for every project as reasonably possible. Does that clarify the intention behind the request? :)

laurentsimon commented 10 months ago

I understand the sha256sum … | base64 -w0. I don't fully understand the gh release download and gh release upload. The generator I linked to does do the release upload by setting the upload-asset: true. Can you clarify this point?

junyer commented 10 months ago

Just to be clear, the idea is not to generate the source code archives manually. (GitHub already does that automatically.) The reason to do the gh release download and gh release upload dance is to take the source code archives that are available via …/archive/refs/tags/… and make them available via …/releases/download/….

laurentsimon commented 10 months ago

ah, I missed this. Thank you. I'm not super familiar how …/archive/refs/tags/… are generated, and by whom. Are these the ones GitHub generates automatically? Are they not the same that are present in the release assets? Or are you merely saying that the API / URL to download them is not consistent with the APIs / URLs used to download other assets in the release (I've not verified if this is the case, just trying to parse your comment), and so you want to add them to the release explicitly?

junyer commented 10 months ago

As linked above, https://blog.bazel.build/2023/02/15/github-archive-checksum.html describes the situation quite well, I think, with the screenshot illustrating the difference between …/archive/refs/tags/… and …/releases/download/… in terms of the release assets. The problem that the gh release download and gh release upload dance solves is one of stability.

I should just clarify that the filenames in the …/archive/refs/tags/… URLs are not the filenames that GitHub actually serves. For the 2023-11-01 release of RE2, for example, GitHub will do the following:

https://github.com/google/re2/archive/refs/tags/2023-11-01.zip -> location: https://codeload.github.com/google/re2/zip/refs/tags/2023-11-01 -> content-disposition: attachment; filename=re2-2023-11-01.zip

Likewise, gh release download uses the "real" filename, so the workflow that I'm proposing would not have to rename files. It's just about doing the gh release download and gh release upload dance and, in the process, generating signature and provenance.

ianlewis commented 9 months ago

Maybe this could be an option on the generic generator? I'm not sure we need a totally separate workflow. WDYT?

junyer commented 9 months ago

I had "do one thing and do it well" in mind, I think, when I suggested another workflow. generator_generic_slsa3.yml has a lot of knobs whereas this use case needs approximately zero knobs. Reusing the generic generator makes sense, of course, but I would argue that there's value in encapsulating/hiding its complexity.

ianlewis commented 9 months ago

Yeah. I hear that. I think 95% of the code would be the same though. The only difference would be that we could omit base64-subjects and base64-subjects-as-file. I think all the other inputs would still be relevant. https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/generic/README.md#workflow-inputs

junyer commented 9 months ago

Fair enough. :)

junyer commented 5 months ago

In light of CVE-2024-3094, could this effort possibly be prioritised? ;)

junyer commented 5 months ago

In case it helps, I wrote https://github.com/google/re2/blob/main/.github/workflows/release.yml earlier today. :)

laurentsimon commented 5 months ago

Hey, sorry for the late reply. Thank for putting an example together. We could support this using BYOB https://slsa.dev/blog/2023/08/bring-your-own-builder-github and https://github.com/slsa-framework/slsa-github-generator/blob/main/BYOB.md

We should be able to take your example and wrap it in a BYOB fairly easily. This would let your users (and other projects' users) to use the slsa-verifier to verify, out of the box, with a common builder.

The only thing we need to change is that the command gh release create "${GITHUB_REF_NAME}" \ --generate-notes --latest --verify-tag \ --repo "${GITHUB_REPOSITORY}" would be done by repository owners and then they'd call our tar / zip builder to create the tarball / zip and upload it. Please correct me if that's incorrect.

Would that work? Happy to help make that happen

junyer commented 5 months ago

IIUC, yes, the PW would run gh release create (except using the API instead of the CLI) and then invoke the TRW, which would handle everything else. Although now I'm guessing that SLSA generating signature and provenance would make Sigstore signing superfluous. :)

laurentsimon commented 5 months ago

IIUC, yes, the PW would run gh release create (except using the API instead of the CLI) and then invoke the TRW, which would handle everything else. Although now I'm guessing that SLSA generating signature and provenance would make Sigstore signing superfluous. :)

You would not need the sigstore signatures, but the SLSA builders use Sigstore too :) There are 2 (related) advantages to a common builder:

  1. Other projects can use the same builder and inspect the code once
  2. No need for users to read each project's workflow code to read how the tarball / zip files are created

If you're OK with all that, I can ahead and turn your example into a SLSA builder

laurentsimon commented 5 months ago

Note to myself https://docs.github.com/en/rest/repos/contents?apiVersion=2022-11-28#download-a-repository-archive-tar

junyer commented 4 months ago

If you're OK with all that, I can ahead and turn your example into a SLSA builder

SGTM. Thanks! :D

laurentsimon commented 4 months ago

Hey, a few months ago GitHub made (then reverted) the tarballs non-deterministic, see https://github.com/orgs/community/discussions/45830. To avoid this sort of problems in the future, the builder could create the archives itself and upload them to the release. That will also let us support other types or archive (if we need to) in the future. Wdut of this approach?

junyer commented 4 months ago

Downloading the source code archives and uploading them as release assets is sufficient, AFAIK, because that's what confers stability. I reckon that creating them (i.e. explicitly) wouldn't add value... but would add complexity. If somebody downloads a slightly different one later because they clicked or pasted the wrong link, I don't think it actually matters who or what created the one at the right link. Or am I misunderstanding a risk from a trust perspective here?

laurentsimon commented 4 months ago

Downloading the source code archives and uploading them as release assets is sufficient, AFAIK, because that's what confers stability. I reckon that creating them (i.e. explicitly) wouldn't add value... but would add complexity.

Not super complicated I think. The BYOB framework already clones the repo, so code is available. We would just need to zip / tar it which does not seem too complicated.

If somebody downloads a slightly different one later because they clicked or pasted the wrong link, I don't think it actually matters who or what created the one at the right link. Or am I misunderstanding a risk from a trust perspective here?

stability requires that archives downloaded from GitHub (with the same link) to be deterministic. The link above shows that GitHub generates archives on the fly when they are requested (to save up storage space I think). This means the signature would fail if the archive is different at download vs when it was first signed. In the link above, that had changed the compression algo so there was a mismatch between sign-time vs download-time of the archive.

Lmk what you think.

junyer commented 4 months ago

I think you might be confusing source code archives and release assets. The image below (taken from the Bazel blog) hopefully clarifies:

Note well that downloading the source code archives and uploading them as release assets makes them stable as release assets, not as source code archives. That's why I'm arguing that, at release time, it doesn't matter whether the source code archives are deterministic. If somebody ends up using a copy of a nondeterministic file, then it really doesn't matter how the deterministic file was created, does it?

laurentsimon commented 4 months ago

My bad, I had not seen that you're re-uploading the source archives as release assets. I thought your example workflow only signed the downloaded source archives without re-uploading them. Re-creating or downloading existing source archives works. Is it fair to say that how the archive is created is an implementation detail you don't care too much about? Or you do care? I'll probably download them to simplify the first iteration, but would like to know if it makes a different for your use case, in particular security wise.

junyer commented 4 months ago

To date, manual manipulation of the source code archives is the big problem. Trusting their creation to GitHub seems no more risky than trusting everything else to GitHub, honestly, so obtaining them from GitHub – as opposed to creating them explicitly – suits me just fine.

laurentsimon commented 4 months ago

note to myself https://github.com/actions/toolkit/tree/main/packages/artifact

ramonpetgrave64 commented 4 months ago

Release artifacts are mutable. I think if we can guarantee the archives to be reproducible, we should try to do it. https://www.gnu.org/software/tar/manual/html_section/Reproducibility.html

laurentsimon commented 4 months ago

Draft PR is https://github.com/slsa-framework/slsa-github-generator/pull/3587. I think we need to tweak it to reduce permissions by using https://github.com/slsa-framework/slsa-github-generator/blob/main/.github/workflows/delegator_lowperms-generic_slsa3.yml, then we're good to go