knative / eventing

Event-driven application platform for Kubernetes
https://knative.dev/docs/eventing
Apache License 2.0
1.41k stars 590 forks source link

eventing-webhook mutates itself when SinkBinding is present #8161

Open mgencur opened 4 weeks ago

mgencur commented 4 weeks ago

Describe the bug Running downgrade from Eventing 1.16 to 1.15 can fail with "no endpoints available for service "eventing-webhook" when ContainerSource (and thus SinkBinding) is present in the cluster.

The upgrade tests scale the eventing-webhook to zero and back to 3. But scaling back to 3 fails with this error:

executor.go:189: Aug 15 11:44:52.727 install_latest_release [ERR] Error from server (InternalError): Internal error occurred: failed calling webhook "sinkbindings.webhook.sources.knative.dev": failed to call webhook: Post "[https://eventing-webhook.knative-eventing.svc:443/sinkbindings?timeout=10s](https://eventing-webhook.knative-eventing.svc/sinkbindings?timeout=10s)": no endpoints available for service "eventing-webhook"

There's a MutatingWebhookConfiguration named sinkbindings.webhook.sources.knative.dev which is run by eventing-webhook itself:

- apiVersion: admissionregistration.k8s.io/v1
  kind: MutatingWebhookConfiguration
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"admissionregistration.k8s.io/v1","kind":"MutatingWebhookConfiguration","metadata":{"annotations":{},"labels":{"app.kubernetes.io/name":"knative-eventing","app.kubernetes.io/version":"1.15.0"},"name":"sinkbindings.webhook.sources.knative.dev"},"webhooks":[{"admissionReviewVersions":["v1","v1beta1"],"clientConfig":{"service":{"name":"eventing-webhook","namespace":"knative-eventing"}},"failurePolicy":"Fail","name":"sinkbindings.webhook.sources.knative.dev","sideEffects":"None","timeoutSeconds":10}]}
    creationTimestamp: "2024-08-15T07:51:05Z"
    generation: 3
    labels:
      app.kubernetes.io/name: knative-eventing
      app.kubernetes.io/version: 1.15.0
    name: sinkbindings.webhook.sources.knative.dev
    resourceVersion: "13981"
    uid: 0a8e964a-e0d9-457c-8ea5-21bcb694f23d
  webhooks:
  - admissionReviewVersions:
    - v1
    - v1beta1
    clientConfig:
      caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNqakNDQWpTZ0F3SUJBZ0lSQUl6V2xXRE8zcHdEa01SZkMwOElTRnN3Q2dZSUtvWkl6ajBFQXdJd1JqRVUKTUJJR0ExVUVDaE1MYTI1aGRHbDJaUzVrWlhZeExqQXNCZ05WQkFNVEpXVjJaVzUwYVc1bkxYZGxZbWh2YjJzdQphMjVoZEdsMlpTMWxkbVZ1ZEdsdVp5NXpkbU13SGhjTk1qUXdPREUxTURjMU1UQTJXaGNOTWpRd09ESXlNRGMxCk1UQTJXakJHTVJRd0VnWURWUVFLRXd0cmJtRjBhWFpsTG1SbGRqRXVNQ3dHQTFVRUF4TWxaWFpsYm5ScGJtY3QKZDJWaWFHOXZheTVyYm1GMGFYWmxMV1YyWlc1MGFXNW5Mbk4yWXpCWk1CTUdCeXFHU000OUFnRUdDQ3FHU000OQpBd0VIQTBJQUJKelc5alhWaFNLRG43Zy9LL3ZmK2traHJhMUhVVTlQRFpyVUVQWDI5YXFmLzNPc09SMCszSTFqCmJEUEJsMmw1OFBsY1BERExSRUEwaFpxc09nSVhOdCtqZ2dFQk1JSCtNQTRHQTFVZER3RUIvd1FFQXdJQ2hEQWQKQmdOVkhTVUVGakFVQmdnckJnRUZCUWNEQVFZSUt3WUJCUVVIQXdJd0R3WURWUjBUQVFIL0JBVXdBd0VCL3pBZApCZ05WSFE0RUZnUVUzVTYxODR1MEp4ZU9MUXVLek5FZDY3dzl1dnN3Z1p3R0ExVWRFUVNCbERDQmtZSVFaWFpsCmJuUnBibWN0ZDJWaWFHOXZhNEloWlhabGJuUnBibWN0ZDJWaWFHOXZheTVyYm1GMGFYWmxMV1YyWlc1MGFXNW4KZ2lWbGRtVnVkR2x1WnkxM1pXSm9iMjlyTG10dVlYUnBkbVV0WlhabGJuUnBibWN1YzNaamdqTmxkbVZ1ZEdsdQpaeTEzWldKb2IyOXJMbXR1WVhScGRtVXRaWFpsYm5ScGJtY3VjM1pqTG1Oc2RYTjBaWEl1Ykc5allXd3dDZ1lJCktvWkl6ajBFQXdJRFNBQXdSUUlnRXNEWk1sY3NXTzZZcDlrRWoyZTZRYzRRZTBlT1RwWmRmeXF6TkpLMnZXSUMKSVFDYUc2azNHYk1FZnhIakFqeU5IWTdyME5GNFRMSG1KWG5jdzZOVURrUjc1QT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
      service:
        name: eventing-webhook
        namespace: knative-eventing
        path: /sinkbindings
        port: 443
    failurePolicy: Fail
    matchPolicy: Equivalent
    name: sinkbindings.webhook.sources.knative.dev
    namespaceSelector:
      matchExpressions:
      - key: bindings.knative.dev/exclude
        operator: NotIn
        values:
        - "true"
    objectSelector:
      matchExpressions:
      - key: bindings.knative.dev/exclude
        operator: NotIn
        values:
        - "true"
    reinvocationPolicy: IfNeeded
    rules:
    - apiGroups:
      - apps
      apiVersions:
      - v1
      operations:
      - CREATE
      - UPDATE
      resources:
      - deployments/*
      scope: '*'
    sideEffects: None
    timeoutSeconds: 10

As a result, the eventing-webhook is not started again and remains scaled to zero.

It happens on this PR which extends upgrade/downgrade tests in a specific way. Some resources are created before upgrade and verified after upgrade, and some resources are created after upgrade and verified later after downgrade. The tests now include ContainerSource which creates SinkBindings.

Example failure is in this run

However, the eventing-webhook Deployment already has the label bindings.knative.dev/exclude: "true" which should exclude it from the webhook selection:

- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "3"
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"eventing-webhook","app.kubernetes.io/name":"knative-eventing","app.kubernetes.io/version":"1.15.0","bindings.knative.dev/exclude":"true"},"name":"eventing-webhook","namespace":"knative-eventing"},"spec":{"selector":{"matchLabels":{"app":"eventing-webhook","role":"eventing-webhook"}},"template":{"metadata":{"labels":{"app":"eventing-webhook","app.kubernetes.io/component":"eventing-webhook","app.kubernetes.io/name":"knative-eventing","app.kubernetes.io/version":"1.15.0","role":"eventing-webhook"}},"spec":{"affinity":{"podAntiAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"podAffinityTerm":{"labelSelector":{"matchLabels":{"app":"eventing-webhook"}},"topologyKey":"kubernetes.io/hostname"},"weight":100}]}},"containers":[{"env":[{"name":"SYSTEM_NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}},{"name":"CONFIG_LOGGING_NAME","value":"config-logging"},{"name":"METRICS_DOMAIN","value":"knative.dev/eventing"},{"name":"WEBHOOK_NAME","value":"eventing-webhook"},{"name":"WEBHOOK_PORT","value":"8443"},{"name":"SINK_BINDING_SELECTION_MODE","value":"exclusion"},{"name":"POD_NAME","valueFrom":{"fieldRef":{"fieldPath":"metadata.name"}}}],"image":"gcr.io/knative-releases/knative.dev/eventing/cmd/webhook@sha256:086fc8a0da40730bbfcc32cb12ebc5c883fe75ff05be1bba54435ef1fdc080ce","livenessProbe":{"httpGet":{"httpHeaders":[{"name":"k-kubelet-probe","value":"webhook"}],"port":8443,"scheme":"HTTPS"},"initialDelaySeconds":120,"periodSeconds":1},"name":"eventing-webhook","ports":[{"containerPort":8443,"name":"https-webhook"},{"containerPort":9090,"name":"metrics"},{"containerPort":8008,"name":"profiling"}],"readinessProbe":{"httpGet":{"httpHeaders":[{"name":"k-kubelet-probe","value":"webhook"}],"port":8443,"scheme":"HTTPS"},"periodSeconds":1},"resources":{"limits":{"cpu":"200m","memory":"200Mi"},"requests":{"cpu":"100m","memory":"50Mi"}},"securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"readOnlyRootFilesystem":true,"runAsNonRoot":true,"seccompProfile":{"type":"RuntimeDefault"}},"terminationMessagePolicy":"FallbackToLogsOnError"}],"enableServiceLinks":false,"serviceAccountName":"eventing-webhook","terminationGracePeriodSeconds":300}}}}
    creationTimestamp: "2024-08-15T07:50:56Z"
    generation: 10
    labels:
      app.kubernetes.io/component: eventing-webhook
      app.kubernetes.io/name: knative-eventing
      app.kubernetes.io/version: 1.15.0
      bindings.knative.dev/exclude: "true"
    name: eventing-webhook
    namespace: knative-eventing
    resourceVersion: "14396"
    uid: 397f7f2f-8753-4de6-892b-eb63c42e9059
  spec:
    progressDeadlineSeconds: 600
    replicas: 0

Expected behavior The eventing-webhook is up-and-running after downgrade.

To Reproduce Having ContainerSource and doing downgrade from 1.16 to 1.15. This PR reproduces the behavior reliably.

Knative release version Downgrading from pre-release 1.16 to 1.15

Additional context

mgencur commented 4 weeks ago

The test log shows that downgrade fails with:

executor.go:189: Aug 15 11:44:52.727 install_latest_release [ERR] Error from server (InternalError):
 Internal error occurred: failed calling webhook "sinkbindings.webhook.sources.knative.dev": 
failed to call webhook: 
Post "[https://eventing-webhook.knative-eventing.svc:443/sinkbindings?timeout=10s](https://eventing-webhook.knative-eventing.svc/sinkbindings?timeout=10s)":
 no endpoints available for service "eventing-webhook"

The test scales the eventing-webhook down to 0 and back to 3. But scaling back to 3 fails.

The new tests from the pull request deploy a container source that brings SinkBinding that causes the issue. Note: I have updated the issue description and title.

mgencur commented 4 weeks ago

My experiments show that adding the label bindings.knative.dev/exclude=true to the knative-eventing namespace fixes the issues. But this is rather a workaround because we already put this label on individual objects. In the end, to exclude some objects, the user has to put the label bindings.knative.dev/exclude=true on both the namespace and individual objects, otherwise: either namespace selector or object selector from the webhook will add them back and include them. The solution would be probably to remove the namespaceSelector from the MutatingWebhookConfiguration sinkbindings.webhook.sources.knative.dev

Cali0707 commented 3 weeks ago

/triage accepted /cc @pierDipi @creydr

pierDipi commented 3 weeks ago

I think we can add this to the webhook configuration and it will be preserved [1]:

    namespaceSelector:
      - key: kubernetes.io/metadata.name
        operator: NotIn
        values:
        - "knative-eventing"

[1] https://github.com/knative/pkg/blob/89743d9bbf7cc691f156b945bb29f96989da4e03/webhook/psbinding/psbinding.go#L348-L352

mgencur commented 3 weeks ago

I think we can add this to the webhook configuration and it will be preserved [1]:

In this case, would it be possible to exclude individual objects in other namespaces through objectSelector like in this commit ? By using the example above all the other namespaces would be already included so excluding individual objects would not be possible, IMO. And I guess excluding whole namespace (if the user wants it) would not be possible either. We could probably flip the condition to use operator: In or use just objectSelector for exclusion.

pierDipi commented 3 weeks ago

exclusion case:

    namespaceSelector:
      - key: kubernetes.io/metadata.name
        operator: NotIn
        values:
        - "knative-eventing"
      - key: bindings.knative.dev/exclude
        operator: NotIn
        values:
        - "true"
    matchConditions:
      - expression: !has(request.object.metadata.labels) || !("bindings.knative.dev/exclude" in request.object.metadata.labels) || request.object.metadata.labels["bindings.knative.dev/exclude"] != "true"
        name: expression

that will only pass when:

given that from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#mutatingwebhook-v1-admissionregistration-k8s-io

MatchConditions is a list of conditions that must be met for a request to be sent to this webhook. Match conditions filter requests that have already been matched by the rules, namespaceSelector, and objectSelector

inclusion case:

    namespaceSelector:
      - key: kubernetes.io/metadata.name
        operator: NotIn
        values:
        - "knative-eventing"
      - key: bindings.knative.dev/include
        operator: In
        values:
        - "true"
    objectSelector:
      matchExpressions:
      - key: bindings.knative.dev/include
        operator: In
        values:
        - "true"

that will only pass when:

mgencur commented 2 days ago

When this issue is fixed, the following workaround should be removed as well in test/e2e-common.sh: link