sigstore / cosign

Code signing and transparency for containers and binaries
Apache License 2.0
4.48k stars 547 forks source link

Sanitize cosign's dependency tree #1462

Open imjasonh opened 2 years ago

imjasonh commented 2 years ago

Problem

Cosign's module dependencies are pretty heavy.

Its go.mod currently transitively depends on k8s.io/client-go, the AWS SDK, the Azure SDK, a GCP cloud storage API client, Docker internals, fsnotify, OpenTelemetry, a Kubernetes controller framework, a GitHub API client, cuelang, and many more.

This can make it difficult to consume for clients that don't want all that functionality, and may just want to sign some bytes using a key, push the resulting image to an OCI registry, and upload data to Rekor with certs via Fulcio.

For example, sget, which does relatively little compared to cosign -- it verifies signatures on an image and fetches a blob -- has a go.mod file that's 280 lines, producing a 72 MB binary 🏋️ .

Large dependency trees lead to larger built binaries (see above), larger attack surfaces, longer and more fragile builds, and (most interesting to me personally as a maintainer!) more toil for maintainers to update dependencies and sift through security alerts for those vulnerabilities.

Possible Solutions

The cosign CLI would likely not get any smaller as a result of this work -- it still naturally depends on all the KMSes, K8s, cloud SDKs -- but other clients that want to do cosign-like things (signing, and especially verifying) wouldn't need to.

cc @luhring @jonjohnsonjr

dlorenc commented 2 years ago
  • Splitting the cosigned webhook into a separate Go module from the cosign CLI, possibly even moving it to a separate repo

I don't think this will really help, but I might be misunderstanding the exact issue. The k8s libraries shouldn't show up in a downstream binary or dependency tree unless they're actually used.

imjasonh commented 2 years ago

I don't think this will really help, but I might be misunderstanding the exact issue. The k8s libraries shouldn't show up in a downstream binary or dependency tree unless they're actually used.

I mentioned this since the cosign CLI depends on k8s.io/client-go just to get/create/update secrets, when it could shell out to kubectl instead (introducing a different set of problems, understandibly). But that wouldn't help anyway, because the Go module it shares with the cosigned webhook necessarily and correctly depends on knative/pkg, k8s.io/client-go, etc.

Go might be smart enough to trim that dependency from the binary, but it still incurs maintenance cost for cosign CLI and SDK maintainers who shouldn't care about K8s' own large dependency graph, security alerts, dependabots, etc.

luhring commented 2 years ago

it still incurs maintenance cost for cosign CLI and SDK maintainers who shouldn't care about K8s' own large dependency graph, security alerts, dependabots, etc.

đź’Ż We're working on integrating some Cosign functionality into Syft, and it looks like we need k8s.io/api via Cosign:

$ go mod why -m k8s.io/api
# k8s.io/api
github.com/anchore/syft/cmd
github.com/sigstore/cosign/cmd/cosign/cli/sign
github.com/sigstore/cosign/pkg/signature
github.com/sigstore/cosign/pkg/cosign/kubernetes
k8s.io/api/core/v1

(from Syft at ec1cceb)

It could be that there's a better way to structure the code to avoid these dependencies, but it's not clear to me how. I think if the "package graph" extends into a given module, that module is needed during build.

dlorenc commented 2 years ago

đź’Ż We're working on integrating some Cosign functionality into Syft, and it looks like we need k8s.io/api via Cosign:

This is correct - today cosign does depend on k8s/api directly for the generate-key-pair command which can create a secret in k8s. Moving this to use a smaller library or shell out to kubectl would solve that problem.

It would not remove k8s from the go.mod file though, because the cosigned webhook still imports that stuff.

I think there are a few different things we're trying to optimize for here:

I don't think moving the cosigned webhook somewhere else helps with any of these, it just shifts the problem around and means there's a separate repo and go.mod file maintainers need to keep up to date.

imjasonh commented 2 years ago

I don't think moving the cosigned webhook somewhere else helps with any of these, it just shifts the problem around and means there's a separate repo and go.mod file maintainers need to keep up to date.

It helps me as library author and cosign maintainer, since it shifts the problem somewhere where I'm less (or not) involved. It does make dependency maintenance slightly more work for cosigned maintainers, but there are always going to be many fewer of those than there will be consumers of a cosign SDK.

I imagine the following "hierarchy of needs" for consumers of cosign logic:

Today, if I want to do anything with container images, my go.mod picks up deps to serve all of the above.

dlorenc commented 2 years ago

Today, if I want to do anything with container images, my go.mod picks up deps to serve all of the above.

I don't think this part is necessarily true, but that might be what I'm missing. A downstream client should be able to import code from cosign without introducing all the dependencies to their own go.mod.

imjasonh commented 2 years ago

For KMSes it might be as simple as moving KMS provider registration out of pkg/signature/kms, into each impl's package, so you can selectively depend on only those KMSes you actually want, and depend on none of them if you want none of them:

https://github.com/sigstore/sigstore/blob/v1.1.0/pkg/signature/kms/kms.go#L31-L44

sambhav commented 2 years ago

Also related https://github.com/sigstore/cosign/issues/651

luhring commented 2 years ago

Today, if I want to do anything with container images, my go.mod picks up deps to serve all of the above.

I don't think this part is necessarily true, but that might be what I'm missing. A downstream client should be able to import code from cosign without introducing all the dependencies to their own go.mod.

I think this is right, in that this doesn't have to be true, but I think we're seeing that it happens to be true today with the current code. (Maybe not "all" deps, but more deps than are actually needed by the downstream consumer.)

Taking better advantage of Go interfaces, especially for KMS services, and moving implementations of those interfaces into separate Go modules or separate repositories

This might be worth exploring first, since it's presumably less drastic of a change than introducing a new repo / Go module.

My understanding of Go's build approach is that it looks at the call graph to determine which packages are needed, and can then determine which modules are needed.

@imjasonh's example is really interesting. init() is tricky because it can effectively start its own call graph, which means potentially traversing more packages and modules, and potentially without the consumer having any practical use for that additional code.

Also, this might be overly pedantic, but what I'm looking to minimize (as a lib consumer) isn't go.mod entries in my project, but the number of modules in the "build list" in my project's binary (e.g. Syft).

dlorenc commented 2 years ago

Also, this might be overly pedantic, but what I'm looking to minimize (as a lib consumer) isn't go.mod entries in my project, but the number of modules in the "build list" in my project's binary (e.g. Syft).

Bingo! This is what we can control without needing to split things up across repos.

vrothberg commented 2 years ago

Shall we set up a dedicated meeting for this topic? I think we can start with getting the very low-level bits done (i.e., signing/verifying bytes + metadata) but I have no vision on how the code should be structured into smaller Go packages (to prune the heavier dependencies).

Cc: @mtrmac

imjasonh commented 2 years ago

I 👍 ed this but I realize it doesn't notify anybody when you do that. 🤦‍♂️

So, đź‘Ť, we should meet some day next week (I'm out Mon+Tues) to map out specific packages/APIs that we'd want to refactor/move/etc.

lukehinds commented 2 years ago

Just looping back on this issue as cosigned is now going to move into its own repo and we have the following issues which @lkatalin has expressed an interest in working on:

I can create a sigstore community calendar invite, how should we set a time? @vrothberg / @imjasonh et al?

vrothberg commented 2 years ago

Thank you for the ping. I'd join the meeting :heavy_check_mark:

lukehinds commented 2 years ago

How would next Wednesday work for 11am EST?

imjasonh commented 2 years ago

That works for me!

vrothberg commented 2 years ago

Works for me as well :heavy_check_mark:

lukehinds commented 2 years ago

@imjasonh @vrothberg done, you will see this on the community calendar. access to calendar is here:

https://github.com/sigstore/community#community-meetings

I may not be able to make it, only just noticed I have a webinar / talk thing going on. Sometimes hangouts (even though its on the community calendar) inherits Red Hat corp google rights, but we have @vrothberg who should be able to open the meeting and admit others.

jeff-mccoy commented 2 years ago

Just finding this issue from twitter. Quick comment for https://github.com/defenseunicorns/zarf as we include both cosign and syft as sdks, the dependency tree is large—but I’d be more concerned with Cosign just falling back to kubectl for us as we can’t guarantee that binary is available when our tools runs.

imjasonh commented 2 years ago

Just finding this issue from twitter. Quick comment for https://github.com/defenseunicorns/zarf as we include both cosign and syft as sdks, the dependency tree is large—but I’d be more concerned with Cosign just falling back to kubectl for us as we can’t guarantee that binary is available when our tools runs.

I assume that's responding to https://github.com/sigstore/cosign/issues/1462#issuecomment-1040653999:

I mentioned this since the cosign CLI depends on k8s.io/client-go just to get/create/update secrets, when it could shell out to kubectl instead (introducing a different set of problems, understandibly). But that wouldn't help anyway, because the Go module it shares with the cosigned webhook necessarily and correctly depends on knative/pkg, k8s.io/client-go, etc.

I agree I don't think shelling out to kubectl will be a good approach, for basically the reasons you describe.

It might be worth exploring replacing the k8s client used to get/create/update secrets with standard HTTP calls, but even that will probably be more headache than it's worth since we'll also want to find and parse kubeconfigs to manage auth to the cluster in question. I think we're stuck with our k8s dependency for good.

fmoessbauer commented 6 months ago

The current cosign binary is over 100MB in size, making it unsuitable for embedded use-cases (discussed in https://github.com/sigstore/cosign/issues/2989 as well). Also from a security perspective it is not desired to have a huge dependency tree. While the sigstore-go implementation is a bit smaller (60M) and can only perform validation, it is no real solution to the problem:

Instead, it would be much better to have a minimal CLI interface that dynamically loads exactly the components that are needed for a particular operation. This would significantly reduce the attack surface, as well as making it possible to only include the components that are needed for a particular operation.

AFAIK the only go solution to that problem is the plugin mechanism which has major limitations. Are there any plans to implement lightweight clients in C or RUST (but modularized, not statically linked)?

haydentherapper commented 6 months ago

We aim to make sigstore-go smaller (https://github.com/sigstore/sigstore-go/issues/23), though there's tradeoffs between reimplementation of dependencies, maintenance and library size that we have to consider. We'll also be adding signing soon to sigstore-go (https://github.com/sigstore/sigstore-go/issues/136), without pulling in KMS dependencies.

There is an implementation of the sigstore signing and verification logic in Rust in sigstore/sigstore-rs, though it has not yet reached a 1.0 release. I'd love to see an example of calling into this library through the FFI - this is on Sigstore's community roadmap.