Feature: dry run and JSON output

laurentsimon commented 2 years ago

I'm working on generating SLSA provenance level 3 for ko using GitHub workflows - see https://security.googleblog.com/2022/04/improving-software-supply-chain.html

It's easy to generate provenance for the ko command + the env variables, however we lose some information about the go command itself. I would very much like to be able to add the go commands to the provenance.

So I think what I need is:

A dry run option, which resolves dynamically-resolved flags (ldflags) without running the compilation steps. The reason I'm asking for a dry run and not just for ko to print this info after compilation is because the arguments are not trusted and someone may pass go build -toolexec (execute some random command) and hijack the information. So I need this information before any user-defined arguments are "run"
Other useful information that can be computed without running user input are useful, but I'm not sure what's feasible here... The SBOM for the go program could be determined post-build if the right flags are passed to the go compiler.. and I dont think you can figure which entries of the go.sum make it to the final build anyway without running the compiler. The list of base images would be useful too , but this may change between a dry run and an actual run so would not be reliable... and it can be determined post-build. Any advice/feedback welcome, though!
A way to print this in a JSON format that can be parsed by the caller, which I can then use to generate the provenance

(1) and (3) are what I'm interested in, and any recommendation for (2) are welcome :-)

laurentsimon commented 2 years ago

/cc @puerco

imjasonh commented 2 years ago

ko can generate this provenance before/while executing a build, and upload it along with the SBOM. This would require some integration with keys or keyless OIDC flows, so we can attest what we saw, but we want that anyway (#357), so I think we should just wrap this up in that.

This seems easier than asking some other tool to invoke ko in a particular way and parse its output. This also eliminates issues where behavior changes between the dry run and the real run.

laurentsimon commented 2 years ago

I have a PoC that wraps ko and does keyless and upload the provenance using a reusable workflow like in https://security.googleblog.com/2022/04/improving-software-supply-chain.html

Note that to achieve SLSA3+, ko needs to generate the provenance before running the go compiler. But generally, even if you do that, the compiler run is not isolated from ko itself, so I'm not sure this would achieve SLSA level 3+. What I mean is that it'd be a little harder to claim that the go command reported is non-forgeable

I'm more than happy to use a ko feature. If it could satisfy the "non-forgeable go command", that'd be very useful

imjasonh commented 2 years ago

Note that to achieve SLSA3+, ko needs to generate the provenance before running the go compiler. But generally, even if you do that, the compiler run is not isolated from ko itself, so I'm not sure this would achieve SLSA level 3+. What I mean is that it'd be a little harder to claim that the go command reported is non-forgeable

Can you elaborate a bit? ko should have no problem attesting the go command it will invoke, right before it invokes it. How could the go command be forgeable?

(I'm not disagreeing, I genuinely think you know more about this than I do, I'm just trying to understand the potential attack)

laurentsimon commented 2 years ago

Note that to achieve SLSA3+, ko needs to generate the provenance before running the go compiler. But generally, even if you do that, the compiler run is not isolated from ko itself, so I'm not sure this would achieve SLSA level 3+. What I mean is that it'd be a little harder to claim that the go command reported is non-forgeable

Can you elaborate a bit? ko should have no problem attesting the go command it will invoke, right before it invokes it. How could the go command be forgeable?

ko sees the input before invoking the go compiler. Ko then invokes the compiler using system() or fork-then-exec (please correct me if I'm wrong) which don't provide strong isolation. For example, there's a go build option called -toolexec that I think may be able to do arbitrary things (I've not checked). In a nutshell, at SLSA3+, we cannot trust the user-defined input, because there's a chance these inputs could hijack the machine, maybe patch some libraries, write into /proc//mem, etc. and overwrite the go final go command with something of its choosing.

Thats the overall reasoning. Ideally we need a way to isolate the build from the provenance generation.

imjasonh commented 2 years ago

One option might be to simply disallow -toolexec. Based on docs, it sounds like a pretty power-user feature, and one that may invalidate a lot of our assumptions/guarantees around safety (not to mention cacheability). So let's just disable it, and fail if you ask us to use it.

WDYT? We can add that check pretty easily today, even before considering provenance.

laurentsimon commented 2 years ago

One option might be to simply disallow -toolexec. Based on docs, it sounds like a pretty power-user feature, and one that may invalidate a lot of our assumptions/guarantees around safety (not to mention cacheability). So let's just disable it, and fail if you ask us to use it.

sounds good to me. These are the list of arguments we allow in our PoC https://github.com/slsa-framework/slsa-github-generator-go/blob/main/pkg/build.go#L41-L48. I think it's over constrained, but we wanted to explore tradeoff and threat model first. We may allow more arguments in the end.

WDYT? We can add that check pretty easily today, even before considering provenance.

Thanks. Let me know if you have more thoughts on this.

laurentsimon commented 2 years ago

Another example for a user to hijack their SBOM is to define LD_PRELOAD as an env variable and commit a .so in their repo. I think this would also allow arbitrary commands to be run during compilation, we would invalidte a SLSA provenance. But a dry run would help here, I think.

Thoughts?

imjasonh commented 2 years ago

At some level this isn't even a ko-specific issue, but I think we should figure this out so other tools can follow our lead.

One nice thing about if we include build tags and envs in the attestation is that a consumer can inspect it to tell whether they trust it. Did it include -toolexec? Was LD_PRELOAD set to something suspicious? Maybe don't trust it.

If trusting the output of go is an issue, we can collect some stuff before we invoke go and either fail loudly if the report differs meaningfully, or only report what the pre-build values were (probably the former over the latter).

laurentsimon commented 2 years ago

At some level this isn't even a ko-specific issue, but I think we should figure this out so other tools can follow our lead.

you're absolutely right. We're thinking of writing a blog post to highlight these problems and explain more reliable ways to do that.

One nice thing about if we include build tags and envs in the attestation is that a consumer can inspect it to tell whether they trust it. Did it include -toolexec? Was LD_PRELOAD set to something suspicious? Maybe don't trust it.

+1. this is the idea behind the SLSA provenance and policies client-side.

If trusting the output of go is an issue, we can collect some stuff before we invoke go and either fail loudly if the report differs meaningfully, or only report what the pre-build values were (probably the former over the latter).

How about a --dry-run option :-)

The main issue is to be sure the dry run and the actual run pull in the exact same dependencies. In go, we can vendor dependencies using go mod vendor and then do go build -mod=vendor. I think it should actually be do-able to

Dry run where you only vendor dependencies and create SBOM
Let the caller (the SLSA trusted builder) use the vendored dependencies and copy them to another VM
Re-run ko in the VM with the vendored dependencies, and enforce -mod=vendor is set in the build command (and that no other -mod= options appear, which is what we do in the Go PoC we have)

I think that'd work for Go, but I'm not sure about the container's side of things. I suppose you'd need to also vendor the base images you're downloading and let the ko command know that it should use them instead of fetching the latest in the second (compilation) run?

imjasonh commented 2 years ago

How about a --dry-run option :-)

My main resistance to a dry run option is that it pushes the work to compare the dry run output and the "real" output to the user, or at least some other orchestration layer invoking ko (and what if ko is compromised?!)

I'd prefer for ko to always produce reliable secure results, or at least understand when/how it can't. I think we can do it, we just need to limit some dangerous options, which we can do.

laurentsimon commented 2 years ago

How about a --dry-run option :-)

My main resistance to a dry run option is that it pushes the work to compare the dry run output and the "real" output to the user, or at least some other orchestration layer invoking ko (and what if ko is compromised?!)

the trusted builder running ko is able to report the ko version that was used. In my PoC, I use the binary from your release; I think long-term I may re-compile the project instead to be able to list its dependencies and report this in the provenance.

I'd prefer for ko to always produce reliable secure results, or at least understand when/how it can't. I think we can do it, we just need to limit some dangerous options, which we can do.

An allow list (instead of a dis-allow list) is more reliable in general. What are the env variable ko takes as input? Most start with KO_, but there seems to be some option KIND_ as well mentioned in the readme?

Having this well documented can also help the builder's writer. In my case I'd like to only allow the env variables that ko knows it should handle, for example.

imjasonh commented 2 years ago

An allow list (instead of a dis-allow list) is more reliable in general. What are the env variable ko takes as input? Most start with KO_, but there seems to be some option KIND_ as well mentioned in the readme?

Having this well documented can also help the builder's writer. In my case I'd like to only allow the env variables that ko knows it should handle, for example.

An allowlist of go tool flags will inevitably become burdensome, I worry -- eg Go v1.1X adds -foo and now ko has to add it to the allowlist and cut a release to avoid breakages. We've already had some issues with such go version incompatibilities. (In general I agree with you about allow vs deny, but I'm a bit scared about the prospect in this case)

ko has minimal envs itself, and almost none affect building (KIND_CLUSTER_NAME affects publishing but not building), but the passthru Go flags/envs are the real problem.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Keep fresh with the 'lifecycle/frozen' label.

ko-build / ko

Feature: dry run and JSON output #684