Closed laurentsimon closed 2 years ago
/cc @puerco
ko
can generate this provenance before/while executing a build, and upload it along with the SBOM. This would require some integration with keys or keyless OIDC flows, so we can attest what we saw, but we want that anyway (#357), so I think we should just wrap this up in that.
This seems easier than asking some other tool to invoke ko
in a particular way and parse its output. This also eliminates issues where behavior changes between the dry run and the real run.
I have a PoC that wraps ko and does keyless and upload the provenance using a reusable workflow like in https://security.googleblog.com/2022/04/improving-software-supply-chain.html
Note that to achieve SLSA3+, ko
needs to generate the provenance before running the go compiler. But generally, even if you do that, the compiler run is not isolated from ko itself, so I'm not sure this would achieve SLSA level 3+. What I mean is that it'd be a little harder to claim that the go command reported is non-forgeable
I'm more than happy to use a ko feature. If it could satisfy the "non-forgeable go command", that'd be very useful
Note that to achieve SLSA3+,
ko
needs to generate the provenance before running the go compiler. But generally, even if you do that, the compiler run is not isolated from ko itself, so I'm not sure this would achieve SLSA level 3+. What I mean is that it'd be a little harder to claim that the go command reported is non-forgeable
Can you elaborate a bit? ko
should have no problem attesting the go
command it will invoke, right before it invokes it. How could the go
command be forgeable?
(I'm not disagreeing, I genuinely think you know more about this than I do, I'm just trying to understand the potential attack)
Note that to achieve SLSA3+,
ko
needs to generate the provenance before running the go compiler. But generally, even if you do that, the compiler run is not isolated from ko itself, so I'm not sure this would achieve SLSA level 3+. What I mean is that it'd be a little harder to claim that the go command reported is non-forgeableCan you elaborate a bit?
ko
should have no problem attesting thego
command it will invoke, right before it invokes it. How could thego
command be forgeable?
ko sees the input before invoking the go compiler. Ko then invokes the compiler using system() or fork-then-exec (please correct me if I'm wrong) which don't provide strong isolation. For example, there's a go build option called -toolexec
that I think may be able to do arbitrary things (I've not checked). In a nutshell, at SLSA3+, we cannot trust the user-defined input, because there's a chance these inputs could hijack the machine, maybe patch some libraries, write into /proc/
Thats the overall reasoning. Ideally we need a way to isolate the build from the provenance generation.
One option might be to simply disallow -toolexec
. Based on docs, it sounds like a pretty power-user feature, and one that may invalidate a lot of our assumptions/guarantees around safety (not to mention cacheability). So let's just disable it, and fail if you ask us to use it.
WDYT? We can add that check pretty easily today, even before considering provenance.
One option might be to simply disallow
-toolexec
. Based on docs, it sounds like a pretty power-user feature, and one that may invalidate a lot of our assumptions/guarantees around safety (not to mention cacheability). So let's just disable it, and fail if you ask us to use it.
sounds good to me. These are the list of arguments we allow in our PoC https://github.com/slsa-framework/slsa-github-generator-go/blob/main/pkg/build.go#L41-L48. I think it's over constrained, but we wanted to explore tradeoff and threat model first. We may allow more arguments in the end.
WDYT? We can add that check pretty easily today, even before considering provenance.
Thanks. Let me know if you have more thoughts on this.
Another example for a user to hijack their SBOM is to define LD_PRELOAD
as an env variable and commit a .so
in their repo. I think this would also allow arbitrary commands to be run during compilation, we would invalidte a SLSA provenance. But a dry run would help here, I think.
Thoughts?
At some level this isn't even a ko-specific issue, but I think we should figure this out so other tools can follow our lead.
One nice thing about if we include build tags and envs in the attestation is that a consumer can inspect it to tell whether they trust it. Did it include -toolexec
? Was LD_PRELOAD
set to something suspicious? Maybe don't trust it.
If trusting the output of go
is an issue, we can collect some stuff before we invoke go
and either fail loudly if the report differs meaningfully, or only report what the pre-build values were (probably the former over the latter).
At some level this isn't even a ko-specific issue, but I think we should figure this out so other tools can follow our lead.
you're absolutely right. We're thinking of writing a blog post to highlight these problems and explain more reliable ways to do that.
One nice thing about if we include build tags and envs in the attestation is that a consumer can inspect it to tell whether they trust it. Did it include
-toolexec
? WasLD_PRELOAD
set to something suspicious? Maybe don't trust it.
+1. this is the idea behind the SLSA provenance and policies client-side.
If trusting the output of
go
is an issue, we can collect some stuff before we invokego
and either fail loudly if the report differs meaningfully, or only report what the pre-build values were (probably the former over the latter).
How about a --dry-run
option :-)
The main issue is to be sure the dry run and the actual run pull in the exact same dependencies. In go, we can vendor dependencies using go mod vendor
and then do go build -mod=vendor
. I think it should actually be do-able to
-mod=vendor
is set in the build command (and that no other -mod=
options appear, which is what we do in the Go PoC we have)I think that'd work for Go, but I'm not sure about the container's side of things. I suppose you'd need to also vendor the base images you're downloading and let the ko command know that it should use them instead of fetching the latest in the second (compilation) run?
How about a
--dry-run
option :-)
My main resistance to a dry run option is that it pushes the work to compare the dry run output and the "real" output to the user, or at least some other orchestration layer invoking ko
(and what if ko
is compromised?!)
I'd prefer for ko
to always produce reliable secure results, or at least understand when/how it can't. I think we can do it, we just need to limit some dangerous options, which we can do.
How about a
--dry-run
option :-)My main resistance to a dry run option is that it pushes the work to compare the dry run output and the "real" output to the user, or at least some other orchestration layer invoking
ko
(and what ifko
is compromised?!)
the trusted builder running ko is able to report the ko version that was used. In my PoC, I use the binary from your release; I think long-term I may re-compile the project instead to be able to list its dependencies and report this in the provenance.
I'd prefer for
ko
to always produce reliable secure results, or at least understand when/how it can't. I think we can do it, we just need to limit some dangerous options, which we can do.
An allow list (instead of a dis-allow list) is more reliable in general. What are the env variable ko takes as input? Most start with KO_
, but there seems to be some option KIND_
as well mentioned in the readme?
Having this well documented can also help the builder's writer. In my case I'd like to only allow the env variables that ko knows it should handle, for example.
An allow list (instead of a dis-allow list) is more reliable in general. What are the env variable ko takes as input? Most start with
KO_
, but there seems to be some optionKIND_
as well mentioned in the readme?Having this well documented can also help the builder's writer. In my case I'd like to only allow the env variables that ko knows it should handle, for example.
An allowlist of go tool flags will inevitably become burdensome, I worry -- eg Go v1.1X adds -foo
and now ko
has to add it to the allowlist and cut a release to avoid breakages. We've already had some issues with such go version incompatibilities. (In general I agree with you about allow vs deny, but I'm a bit scared about the prospect in this case)
ko
has minimal envs itself, and almost none affect building (KIND_CLUSTER_NAME
affects publishing but not building), but the passthru Go flags/envs are the real problem.
This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Keep fresh with the 'lifecycle/frozen' label.
I'm working on generating SLSA provenance level 3 for ko using GitHub workflows - see https://security.googleblog.com/2022/04/improving-software-supply-chain.html
It's easy to generate provenance for the ko command + the env variables, however we lose some information about the go command itself. I would very much like to be able to add the go commands to the provenance.
So I think what I need is:
A dry run option, which resolves dynamically-resolved flags (ldflags) without running the compilation steps. The reason I'm asking for a dry run and not just for ko to print this info after compilation is because the arguments are not trusted and someone may pass
go build -toolexec
(execute some random command) and hijack the information. So I need this information before any user-defined arguments are "run"Other useful information that can be computed without running user input are useful, but I'm not sure what's feasible here... The SBOM for the go program could be determined post-build if the right flags are passed to the go compiler.. and I dont think you can figure which entries of the go.sum make it to the final build anyway without running the compiler. The list of base images would be useful too , but this may change between a dry run and an actual run so would not be reliable... and it can be determined post-build. Any advice/feedback welcome, though!
A way to print this in a JSON format that can be parsed by the caller, which I can then use to generate the provenance
(1) and (3) are what I'm interested in, and any recommendation for (2) are welcome :-)