sigstore / policy-controller

Sigstore Policy Controller - an admission controller that can be used to enforce policy on a Kubernetes cluster based on verifiable supply-chain metadata from cosign
Other
124 stars 54 forks source link

No matching policies (Restricted Environment) #681

Open gustavoromerobenitez opened 1 year ago

gustavoromerobenitez commented 1 year ago

Description

I'm testing the policy-controller in an air-gapped environment using GKE and a private container registry. I have deployed it via the official Helm chart in its own namespace security-enforcement.

I'm also using a custom registryCaBundle and image-pull-secrets, and the webhook can connect to the private registry to resolve tags into digests.

The following Extra Args are also set: policy-resync-period:1m and disable-tuf: true.

After trying more complex configurations and policies without success, I wanted to test the simplest possible ClusterImagePolicy:

k label namespace <namespace> policy.sigstore.dev/include=true
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: <namespace>
spec:
  containers:
  - name: nginx
    image: <PRIVATE_REGISTRY>/library/nginx:latest
    ports:
    - containerPort: 80
apiVersion: policy.sigstore.dev/v1beta1
kind: ClusterImagePolicy
  metadata:
    name: manual-allow-all    
  spec:
    authorities:
    - name: allow-all
      static:
        action: pass
    images:
    - glob: '**'
    mode: enforce

but I get the following error with that policy and any other I have tried (including mode: warn):

Error from server (BadRequest): error when creating "pod.yaml": admission webhook "policy.sigstore.dev" denied the request: validation failed: no matching policies: spec.containers[0].image
<PRIVATE_REGISTRY>/library/nginx@sha256:1ed4dff2b99011798f2c228667d7cb4f4e2bd76b2adc78fd881d39f923e78c9d
{"caller":"webhook/admission.go:93", "commit":"89ef904-dirty", "level":"info", "logger":"policy-controller", "msg":"Webhook ServeHTTP request=&http.Request{Method:"POST", URL:(*url.URL)(0xc000823b00), Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Accept":[]string{"application/json, */*"}, "Accept-Encoding":[]string{"gzip"}, "Content-Length":[]string{"4985"}, "Content-Type":[]string{"application/json"}, "User-Agent":[]string{"kube-apiserver-admission"}}, Body:(*http.body)(0xc000bed500), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:4985, TransferEncoding:[]string(nil), Close:false, Host:"webhook.security-enforcement.svc:443", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:"10.225.7.41:32932", RequestURI:"/validations?timeout=10s", TLS:(*tls.ConnectionState)(0xc000fc0d10), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc000bed540)}", "ts":"2023-03-23T13:08:08.734Z"}

{"caller":"validation/validation_admit.go:180", "commit":"89ef904-dirty", "knative.dev/kind":"/v1, Kind=Pod", "knative.dev/name":"nginx", "knative.dev/namespace":"<NAMESPACE>", "knative.dev/operation":"CREATE", "knative.dev/resource":"/v1, Resource=pods", "knative.dev/subresource":"", "knative.dev/userinfo":"***", "level":"error", "logger":"policy-controller", "msg":"Failed the resource specific validation", "stacktrace":"knative.dev/pkg/webhook/resourcesemantics/validation.validate knative.dev/pkg@v0.0.0-20221027143007-728dfd8e2862/webhook/resourcesemantics/validation/validation_admit.go:180 knative.dev/pkg/webhook/resourcesemantics/validation.(*reconciler).Admit knative.dev/pkg@v0.0.0-20221027143007-728dfd8e2862/webhook/resourcesemantics/validation/validation_admit.go:79 knative.dev/pkg/webhook.admissionHandler.func1 knative.dev/pkg@v0.0.0-20221027143007-728dfd8e2862/webhook/admission.go:123 net/http.HandlerFunc.ServeHTTP net/http/server.go:2109 net/http.(*ServeMux).ServeHTTP net/http/server.go:2487 knative.dev/pkg/webhook.(*Webhook).ServeHTTP knative.dev/pkg@v0.0.0-20221027143007-728dfd8e2862/webhook/webhook.go:262 knative.dev/pkg/network/handlers.(*Drainer).ServeHTTP knative.dev/pkg@v0.0.0-20221027143007-728dfd8e2862/network/handlers/drain.go:113 net/http.serverHandler.ServeHTTP net/http/server.go:2947 net/http.(*conn).serve net/http/server.go:1991", "ts":"2023-03-23T13:08:08.746Z"}

{"admissionreview/allowed":false, "admissionreview/result":"&Status{ListMeta:ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Status:Failure,Message:validation failed: no matching policies: spec.containers[0].image <PRIVATE_REGISTRY>/library/nginx@sha256:1ed4dff2b99011798f2c228667d7cb4f4e2bd76b2adc78fd881d39f923e78c9d,Reason:BadRequest,Details:nil,Code:400,}", "admissionreview/uid":"44e74297-d2d7-42f0-9be7-ccf32d629b8e", "caller":"webhook/admission.go:151", "commit":"89ef904-dirty", "knative.dev/kind":"/v1, Kind=Pod", "knative.dev/name":"nginx", "knative.dev/namespace":"<NAMESPACE", "knative.dev/operation":"CREATE", "knative.dev/resource":"/v1, Resource=pods", "knative.dev/subresource":"", "knative.dev/userinfo":"***", "level":"info", "logger":"policy-controller", "msg":"remote admission controller audit annotations=map[string]string(nil)", "ts":"2023-03-23T13:08:08.746Z"}

{"admissionreview/allowed":false, "admissionreview/result":"&Status{ListMeta:ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Status:Failure,Message:validation failed: no matching policies: spec.containers[0].image <PRIVATE_REGISTRY>/library/nginx@sha256:1ed4dff2b99011798f2c228667d7cb4f4e2bd76b2adc78fd881d39f923e78c9d,Reason:BadRequest,Details:nil,Code:400,}", "admissionreview/uid":"44e74297-d2d7-42f0-9be7-ccf32d629b8e", "caller":"webhook/admission.go:152", "commit":"89ef904-dirty", "knative.dev/kind":"/v1, Kind=Pod", "knative.dev/name":"nginx", "knative.dev/namespace":"<NAMESPACE>", "knative.dev/operation":"CREATE", "knative.dev/resource":"/v1, Resource=pods", "knative.dev/subresource":"", "knative.dev/userinfo":"***", "level":"debug", "logger":"policy-controller", "msg":"AdmissionReview patch={ type: , body: }", "ts":"2023-03-23T13:08:08.746Z"}
INFO 2023-03-23T13:07:23.502722735Z [resource.labels.containerName: policy-webhook] {"caller":"clusterimagepolicy/controller.go:96", "commit":"89ef904-dirty", "level":"info", "logger":"clusterimagepolicy", "msg":"Doing a global resync on ClusterImagePolicies due to ConfigMap changing or resync period.", "ts":"2023-03-23T13:07:23.502Z"}

WARNING 2023-03-23T13:07:49.323759264Z [resource.labels.containerName: policy-webhook] k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *v1alpha1.TrustRoot: trustroots.policy.sigstore.dev is forbidden: User "system:serviceaccount:security-enforcement:cosign-policy-controller-app-policy-webhook" cannot list resource "trustroots" in API group "policy.sigstore.dev" at the cluster scope

ERROR 2023-03-23T13:07:49.323835992Z [resource.labels.containerName: policy-webhook] k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *v1alpha1.TrustRoot: failed to list *v1alpha1.TrustRoot: trustroots.policy.sigstore.dev is forbidden: User "system:serviceaccount:security-enforcement:cosign-policy-controller-app-policy-webhook" cannot list resource "trustroots" in API group "policy.sigstore.dev" at the cluster scope

INFO 2023-03-23T13:07:57.922789694Z [resource.labels.containerName: policy-webhook] {"caller":"webhook/conversion.go:45", "commit":"89ef904-dirty", "level":"info", "logger":"clusterimagepolicy", "msg":"Webhook ServeHTTP request=&http.Request{Method:"POST", URL:(*url.URL)(0xc0047059e0), Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Accept":[]string{"application/json, */*"}, "Accept-Encoding":[]string{"gzip"}, "Content-Length":[]string{"1198"}, "Content-Type":[]string{"application/json"}, "User-Agent":[]string{"kube-apiserver-admission"}}, Body:(*http.body)(0xc00470c800), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:1198, TransferEncoding:[]string(nil), Close:false, Host:"policy-webhook.security-enforcement.svc:443", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:"10.225.7.43:35944", RequestURI:"/resource-conversion?timeout=30s", TLS:(*tls.ConnectionState)(0xc004c62840), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc00470c840)}", "ts":"2023-03-23T13:07:57.922Z"}

INFO 2023-03-23T13:07:57.926033177Z [resource.labels.containerName: policy-webhook] {"caller":"webhook/conversion.go:45", "commit":"89ef904-dirty", "level":"info", "logger":"clusterimagepolicy", "msg":"Webhook ServeHTTP request=&http.Request{Method:"POST", URL:(*url.URL)(0xc004705cb0), Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Accept":[]string{"application/json, */*"}, "Accept-Encoding":[]string{"gzip"}, "Content-Length":[]string{"1275"}, "Content-Type":[]string{"application/json"}, "User-Agent":[]string{"kube-apiserver-admission"}}, Body:(*http.body)(0xc00470cec0), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:1275, TransferEncoding:[]string(nil), Close:false, Host:"policy-webhook.security-enforcement.svc:443", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:"10.225.7.43:35944", RequestURI:"/resource-conversion?timeout=30s", TLS:(*tls.ConnectionState)(0xc004c62840), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc00470cf00)}", "ts":"2023-03-23T13:07:57.925Z"}

I've tried with other Glob patterns, like including the full path of the image in the registry, but to no avail.

Could you tell me why the policy is not matched ?

Should I be concerned about those TrustRoot errors or can they be ignored or suppressed ?

Thank you.

Version

policy-controller AppVersion: 0.7.0 policy-controller Helm Chart version: 0.5.4 Kubernetes (GKE) version: 1.24.9-gke.3200

bobthomson70 commented 1 year ago

This bug is a blocker for us.

meons commented 1 year ago

Same here

hectorj2f commented 1 year ago

@gustavoromerobenitez You shouldn't be concerned about these TrustRoot errors (although I am gonna check if I can reproduce them). Those are not related here.

Could you try using using this value for authorities ?

authorities: [static: {action: pass}]

If you want the policy controller to ignore not matching policies, you could also configure at controller level by running the following command:

kubectl patch cm config-policy-controller -n cosign-system --type merge -p '{"data":{"no-match-policy":"warn"}}'

Anyway I'll give it a try on my own cluster to verify the parameter disable-tuf is not causing any issue here.

gustavoromerobenitez commented 1 year ago

Thanks @hectorj2f, I will try those options and post the results here.

How can I tell which policies have been loaded by the controller, and which policies the image path is being checked against? Thank you

hectorj2f commented 1 year ago

@gustavoromerobenitez There is a configMap config-image-policies that shows the loaded images ? The logs show how many policies (a counter) are matching a specific image.

gustavoromerobenitez commented 1 year ago

@hectorj2f While I was trying to test your proposed changes, I have noticed that I'm also hitting this issue. I'm not sure if they are related but the policy-webhook crashes due to the missing certificates, despite having re-created the namespace multiple times.

2023-03-31 18:08:28.734 CEST
{caller: webhook/webhook.go:154, commit: 89ef904-dirty, level: warn, logger: clusterimagepolicy, msg: server key missing, ts: 2023-03-31T16:08:28.734Z}
2023-03-31 18:08:28.734 CEST
2023/03/31 16:08:28 http: TLS handshake error from 10.225.7.23:37392: tls: no certificates configured

The webkhook-certs secret is created correctly but the policy-webhook-certs secret contains no data. Perhaps that issue should be re-opened.

I will try disabling auto-tls via annotation as per the documentation. Would I need to configure anything else on the policy controller or would it rely on the KNative functionality ?

hectorj2f commented 1 year ago
https://github.com/sigstore/policy-controller/issues/41. I'm not sure if they are related but the policy-webhook crashes due to the missing certificates, despite having re-created the namespace multiple times.

You need to delete the leases if you tried reinstalling the chart, that sometimes happens, kubectl delete leases -n cosign-system.

hectorj2f commented 1 year ago

It is a known issue the webhooks are not able to re-claim the leases, we haven't figured it out yet how to solve it. So the recommendation consists on deleting the leases when the pods are not getting ready.

gustavoromerobenitez commented 1 year ago

@hectorj2f Thank you. I did delete the leases before posting above. I did so before deleting the namespace and re-installing the chart, but the issue with the certificates was still present. I haven't had the opportunity to test without auto-tls yet though.

hectorj2f commented 1 year ago

@gustavoromerobenitez I just tried and the problem goes away by deleting the chart (waiting until is properly deleted), then deleted the leases and reinstalled. If your problem persists we could try increasing the leader election time.

The problem goes also away when using multiple replicas instead of the default which uses a single replica. We'll unify the controllers into a single pod so it'd be easier to increase the amount of replicas by default.