Adding build provenance for all registries doc

steiza commented 1 year ago

As suggested by @di

EDIT: After the EMEA working group meeting, I realized I should spell out in greater detail what the aim is here.

We've seen great success with npm build provenance, with over 1,000 packages adopting it in about 2 months. In that time, I've been having 1-1 conversations with other registries interested in implementing build provenance. While these conversations have been great, they are 1-1, and don't have durable public artifacts we can point people to.

While this document today covers design and implementation decisions primarily from the point of view of npm, as other ecosystems adopt build provenance we can update it to reflect what those ecosystems have learned as well.

znewman01 commented 1 year ago

Oops, just saw this after I lit up your old Google doc draft with comments. I won't copy them all here but tl;dr I think this looks good overall, I think there's a few structural changes you could make to make this more readable by someone who's not quite as familiar with the way it works in npm.

Eh2406 commented 1 year ago

IMO we could spin our wheels forever on this but I'd love to just get it in front of a repository operator who's not familiar with this effort and see their reaction, so I'm okay to merge. Let's resolve others' comments as well first though :)

Hi! I am a maintainer of Cargo (the package manager for Rust) and am friendly with the maintainers of crates.io (the associated public registry). I do have some familiarity with the ongoing provenance efforts, but my understanding is only from the outside reading people's blog posts. So I'm very close to the target audience you describe. Someone with decision-making authority to follow npm's lead and interest in understanding what they have done, but who has not yet put in the effort to make the decisions from first principles.

The goal here is for the WG to produce something proscriptive, for other software repositories, based on the prior experience of npm/PyPI.

A prescriptive document that would be convincing for me would:

start by describing the problem
flowing from that description would articulate what properties a solution would need to have
describe the simplest thing that could meet those requirements, and point out why it would be unsatisfactory
iteratively and incrementally add complexity to the design doing justification at each step (this one can be skipped or shortened depending on author effort)
describe why adopting the established tools meets those requirements

Stating that differently, I'm desperate to ask the author of this post... What were the alternatives? Why did you do this implementation as opposed to the alternatives (including doing nothing)? Not just for the overall project, but for each decision?

The document as written asked me to reverse engineer your goals. It describes what you did, and prescribes that I should do likewise. It does not communicate with me what parts are critical and what parts are accidental to your implementation. Figuring out what someone intended from their implementation is reverse engineering, which is easier than inventing it from scratch, but is still a lot of work.

Let me see if I can outline the document I would love to read, given my limited understanding...

The problem is that on most registries it is not clear where the uploaded artifacts came from. It looks like they came from some kind of "normal build" from the linked "version control system". But it's entirely possible that what was uploaded to the registry has nothing to do with the sources or methods it links to. For many artifacts they in fact come from a normal CI run from the official version control system, it would be nice for these packages to be able to show publicly that this step of the process was not hacked for this upload. If we trust the CI provider more than we trust the uploader, then the CI provider can attest to the authenticity of the uploaded artifact. GitHub OIDC provides this, as do OIDC and end points from other CI providers. If the registry just uses this to prove that this individual upload is what it says it is, then you have the trusted publisher system Pypi is experimenting with. Which is a huge security improvement and should be copied by others. But we can go further. If the registry published the OIDC token along with the artifact, then users could verify that this artifact had not been tampered with in this way. Unfortunately, ... (I don't know this step just guessing here.) These CI provided OIDC tokens can only be verified as authentic during a short window after their creation. (I don't know if this one is factually true.) Therefore we need something to check the CI provided OIDC and provide a more permanent attestation of its validity. The registry could just publish the attestation "This CI provided OIDC was valid at time of publish", but the registry is now providing the artifact and the proof that we should trust the artifact. This leaves us vulnerable to the registry being hacked. Therefore, we need a separate independent long-term attestation service. But it could give different answers to different users... So we need a log service... (And this is where my knowledge of how the pieces fit together and why they're justified completely falls apart.)

steiza commented 1 year ago

Stating that differently, I'm desperate to ask the author of this post... What were the alternatives?

This is good feedback. This doc came from conversations with other registry maintainers who already decided on providing build provenance with Sigstore. Instead of trying to remake this doc, I think it would make sense to write a "prequel" with a working title of "Alternatives to Sigstore for Build Provenance". I'm not sure I'll get to posting a reviewable draft of that this week, but your data point of requesting that content is useful!

tiegz commented 1 year ago

Instead of trying to remake this doc, I think it would make sense to write a "prequel" with a working title of "Alternatives to Sigstore for Build Provenance".

I'm just lurking here, and not sure if this would help, but I did a summary of package-signing efforts across platforms in 2020. A little outdated now tho, with SigStore: https://blog.tidelift.com/package-signing

evankanderson commented 1 year ago

I'm wondering if something like the following would help:

A long, long time ago (the 1990s), software dependencies were installed manually on each development machine. Eventually, the installation process was standardized by language-specific package repositories. This enabled developers to easily install and share libraries, and was a major contributor to the growth of open source.

Unfortunately, the ease of consuming open source introduced a number of new supply chain attacks on applications: typosquatting, package takeovers, packaging compromises, and others (insert handy link). It is difficult today for software consumers (developers) to be sure that the software that they consume is safe. One tool which can help ensure that developers are consuming the intended package is package signatures provided by a tool like sigstore.

ossf / wg-securing-software-repos

Adding build provenance for all registries doc #17