slsa-framework / slsa

Supply-chain Levels for Software Artifacts
https://slsa.dev
Other
1.48k stars 213 forks source link

Rename "hosted" to "dedicated"? #947

Open MarkLodato opened 10 months ago

MarkLodato commented 10 months ago

There is recurring confusion over the word "hosted", with many readers incorrectly interpreting that to mean some sort of external cloud provider. Instead, the intention is really just that it runs on a dedicated machine rather than an individual's workstation.[^reproducible-builds] In fact, the requirements say exactly that:

All build steps ran using a hosted build platform on shared or dedicated infrastructure, not on an individual’s workstation.

For v1.1, any thoughts on replacing "hosted" with "dedicated"? Would that make the intent more clear?

Sample changes:

Before After
L2: Hosted build platform L2: Dedicated build platform
"All build steps ran using a hosted build platform on shared or dedicated infrastructure," "All build steps ran using a dedicated build platform on shared or dedicated infrastructure,"
"A build platform is often a hosted, multi-tenant build service" "A build platform is often a dedicated, multi-tenant build service"

[^reproducible-builds]: In the case of reproducible builds, the rebuilder that is trusted would be the dedicated machine; it's fine for the original build to be on a workstation since it's bit-for-bit identical.)

arewm commented 10 months ago

I think that this clarity is helpful except for the second change as where dedicated is repeated. I was trying to think of an alternative for the first use,

An alternative with a different word:

All build steps ran using a single-purpose build platform on shared or dedicated infrastructure

Or maybe it works sufficiently well with it removed:

All build steps ran using a build platform on shared or dedicated infrastructure,

MarkLodato commented 10 months ago

Yeah, I didn't love the double "dedicated" but gave up trying to solve it. Your second suggestion (removing it before "build platform") sounds good to me.

joshuagl commented 10 months ago

The change to dedicated seems reasonable to me, especially with the removal of dedicated before build platform. 👍

david-a-wheeler commented 10 months ago

I think the term "dedicated" is more confusing, not less. The term "dedicated" has other meanings.

Instead, I think the problem is that there's no clear definition of the term "hosted". The earlier text about hosted never says "The term hosted means..." or anything else that indicates it's a definition. E.g.:

A hosted system is ... and is not ...

david-a-wheeler commented 10 months ago

Current text:

Hosted: All build steps ran using a hosted build platform on shared or dedicated infrastructure, not on an individual’s workstation. Examples: GitHub Actions, Google Cloud Build, Travis CI

First cut revisions:

Hosted: All build steps ran using a hosted build platform. A hosted build platform is a shared or dedicated infrastructure used for building and is maintained by a team. An individual’s workstation is, by definition, not a hosted build platform. Examples: GitHub Actions, Google Cloud Build, Travis CI

CircuitSwan commented 10 months ago

Hosted: A system on which all build steps run. A hosted build platform may be external or internal, shared or dedicated infrastructure used for building which is well maintained. An individual’s workstation is, by definition, not a hosted build platform.

Examples: GitHub Actions, Google Cloud Build, Travis CI

CircuitSwan commented 10 months ago

Also we should add this to the "terminology" page and link to it

david-a-wheeler commented 10 months ago

Here's another try:

Hosted build platform: A system on which all build steps run (in particular its hardware and operating system). A hosted build platform may be external or internal, shared or dedicated infrastructure used for building. Such a system must be well maintained, including hardening against attack, and not controlled by the individual requesting a build (to provide separation of concerns). An individual’s workstation is, by definition, not a hosted build platform. Examples: GitHub Actions, Google Cloud Build, CircleCI

I think a key part of being "hosted" is that it emphasizes a "separation of concerns" (the build platform is operated by different people than the person who uses it). Obviously those who write the build scripts can cause the build to do bad things, but then the version control system tracks who did that.

I hope you don't mind but I switched Travis to CircleCI, I think that's a better example.

jkjell commented 10 months ago

Given how we're trying to (re)define this, the examples seem orthogonal to the definition. It sounds more like we're defining a property of the Build Platform, of which most CI systems would be included, right? For instance, can we define a negative example? We defined the negative property (i.e. a developers laptop).

arewm commented 10 months ago

While an individual's laptop is a negative property, I think it fits well as a negative example too. If you want a different specific example then you would likely need to call out some specific package/community directly since these are by definition not distributed systems.

Would it be simper to use the word managed instead? We can also alleviate some confusion around hosted by specifically including on-premise and cloud infrastructure as fitting the definition. Here is an example of the change with context expanded to the containing paragraphs:

Before After
L2: Hosted build platform L2: Managed build platform
"A package’s build platform is the infrastructure used to transform the software from source to package. This includes the transitive closure of all hardware, software, persons, and organizations that can influence the build. A build platform is often a hosted, multi-tenant build service, but it could be a system of multiple independent rebuilders, a special-purpose build platform used by a single software project, or even an individual’s workstation." "A package’s build platform is the infrastructure used to transform the software from source to package. This includes the transitive closure of all hardware, software, persons, and organizations that can influence the build. A build platform is often a managed, multi-tenant build service, but it could be a system of multiple independent rebuilders, a special-purpose build platform used by a single software project, or even an individual’s workstation."
"All build steps ran using a hosted build platform on shared or dedicated infrastructure, not on an individual’s workstation. Examples: GitHub Actions, Google Cloud Build, Travis CI." "All build steps ran using a managed build platform on shared or dedicated infrastructure either owned by the build platform or hosted on public infrastructure. Examples: GitHub Actions, Google Cloud Build, Travis CI. Counter examples: Individual's workstations."
MarkLodato commented 10 months ago

Maybe we should focus more on why the requirement exists, before we choose a name or definition. @david-a-wheeler raised separation of concerns, and I also raised https://slsa.dev/spec/v1.0/principles#corollary-minimize-the-number-of-trusted-platforms. But we don't have agreement here, as discussed in yesterday's spec meeting. Once we agree on what the objective of the requirement is, that should help us narrow down what does and does not satisfy that objective.

david-a-wheeler commented 10 months ago

@MarkLodato :

Maybe we should focus more on why the requirement exists, before we choose a name or definition.

Fair enough. Since this isn't documented (or, it appears, agreed on), I suggest working backwards to identify a list of reasons people might want this requirement, then try to hone in on the ones we (as a group) agree are important, so that we can clearly state it. BTW, I think it's quite possible to include a requirement for multiple reasons (that's not a problem).

arewm commented 10 months ago

Some initial whys that come to mind:

However, none of these really make sense in terms of the build track's levels itself.

In the discussions of the future Build track's levels, it has been mentioned that each track should effectively have a primary goal which all levels can be measured against to reach some goal (should this be official/semi-official and documented somewhere on the website?). For the build track, the goal is to generate an accurate, complete, and authentic provenance describing the build.

In looking at the L2 for provenance, the hosted requirement seems to fit most with the clause:

Define trust: Identify the build platform and other entities that are necessary to trust in order to trust the artifact they produced. [ref]

To this end, having a dedicated/hosted/[...] is a step in defining the entity that is the build platform. If the platform is run on some shared resource then we are not able to as clearly indicate where the transitive closure ends in order to define the platform

System that allows tenants to run builds. Technically, it is the transitive closure of software and services that must be trusted to faithfully execute the build. It includes software, hardware, people, and organizations. [ref]

sudo-bmitch commented 10 months ago

Thinking about the value this gives me, I'd phrase this as wanting a "well maintained and properly secured build server".

We can this list examples of what we consider typically approved (SaaS solutions and on-prem hosted CI) and rejected (developer personal machine). Importantly, I wouldn't consider a 5 year old unpatched Jenkins server exposed to the public internet a well maintained server, and so it shouldn't be approved just because it's a separate server that's not the developers laptop.

arewm commented 10 months ago

This conversation will likely run up against the Build Platform Operations track, so we will need to ensure that we keep those conversations distinct (while still understanding how they relate). I don't think the running (i.e. patching, firewall rules, etc) of the build platform would not fall within the hosted/dedicated clarification. Any clarification here should assume that the platform is well-intended/operationalized.

david-a-wheeler commented 10 months ago

@arewm said:

  • Increases isolation between build platform developers/operators and the build platform itself. Specifically, it would require a malicious actor to pivot after a compromise. For example, an exploit from a compromised email read on a build platform wouldn't immediately grant access to the build system. ...
  • Actions taken by the developers/operators on the build platform's systems would not be conflated with a those on a separate system (i.e. when assessing access logs).

In our discussions these were the primary purpose I had in mind. As I said earlier, "I think a key part of being "hosted" is that it emphasizes a "separation of concerns" (the build platform is operated by different people than the person who uses it)."

It also provides some resilience if the lead maintainer disappears. I'm personally dealing with this as a side project. The lead and friend of mine (Norm Megill) died unexpectedly, and he used his personal computer to do all the builds. That computer was going to soon disappear, so I had to do a transition to move building from his personal system to a build that can be maintained by others.

  • Enable specific operational controls (in the future) to be implemented on systems whose dedicated purpose is the build platform.

That's not a bad reason, but I suspect we want to identify a few specific controls that would make it worth the trip.

joshuagl commented 10 months ago

One reason the requirement exists is to reduce the number of systems a consumer must trust. I think of this often in the context of getting packages from a Linux distro vs. an upstream produced package. If I trust my distro vendor, I can get hundreds of trusted packages as part of that decision. If I want to individually retrieve all of my packages from the upstream locations, I have to decide whether a trust the output of multiple build systems (if I can even determine what the build system is).

sudo-bmitch commented 10 months ago

It also provides some resilience if the lead maintainer disappears.

I think this is a bigger challenge that should be addressed directly rather than with indirect build server requirements. There are other issues caused by a single point of failure, including access to the repositories to push new releases, signing build results, and the general likelihood of the project continuing without a key maintainer. Given that the issue spans beyond the build process itself, I'm not sure if we want a "no single point of failure" requirement for just the build that may get extended later, or if there's a better way to capture it.

This does raise a general concern of mine that OpenSSF may want a track for lone developer projects that want to improve their security without dealing with projects like SLSA and others that mark them as insecure, removing any incentive to add higher level security features like reproducible builds.

david-a-wheeler commented 10 months ago

@sudo-bmitch : The OpenSSF Best Practices badge is specifically designed so the "passing" and "silver" criteria can be met by a single developer. "Gold" can't, because it includes requirements that require multiple developers, but there are many things that can be practically done even by a single-person project. I think that's true for many other things as well.

In the case of hosting, a single-person project can choose to use a hosted system, so that they don't have to do it all themselves. I don't see why a single-person project can't do this. If I'm mistaken, please enlighten me!

sudo-bmitch commented 10 months ago

In the case of hosting, a single-person project can choose to use a hosted system, so that they don't have to do it all themselves. I don't see why a single-person project can't do this. If I'm mistaken, please enlighten me!

@david-a-wheeler I'm kinda debating both for and against something at the same time, which is confusing. I completely agree that a single developer could implement a hosted build, which means that SLSA 1.0 may not prevent issues encountered by projects when a lead maintainer is no longer available.

If we were to fix that with a requirement to avoid single points of failure, then my suggestion is to ensure the solo developers have some kind of incentive to continue adding security features to their projects and not just stop once they hit that requirement. (I.e. in Best Practices have a Solo-Gold, which I believe would get a lot of attention given the number of single maintainer projects out there.)

Either way, I think the "single point of failure" and the hosted build platform should probably be kept separate, so I'll stop here to avoid derailing the issue.

MarkLodato commented 10 months ago

For the build track, the goal is to generate an accurate, complete, and authentic provenance describing the build.

I agree with this characterization, with the addition from levels.md:

The primary purpose of the build track is to enable verification that the artifact was built as expected

This is mostly a repeat of what was said above, but in case repeating helps us get toward consensus:

By the way, if the reasoning gets long, we could optionally hide it behind <details> tag.

steve-work-account commented 10 months ago

I was confused on the wording of "hosted" and asked in slack. After being directed to this issue, and understanding what hosted means in this context, It might also be worth considering having an entry on the Terminology page.

https://slsa.dev/spec/v1.0/terminology#build-model

That has a clear definition of what is meant by hosted and then inserting a link to that definition on: https://slsa.dev/spec/v1.0/levels#build-l2-hosted-build-platform

chizou commented 6 months ago

Can the examples be updated as well? Every example provided is a hosted service, which I think is adding to current confusion and might continue to confuse even if you use change the verbiage.