Open mgencur opened 4 weeks ago
The test log shows that downgrade fails with:
executor.go:189: Aug 15 11:44:52.727 install_latest_release [ERR] Error from server (InternalError):
Internal error occurred: failed calling webhook "sinkbindings.webhook.sources.knative.dev":
failed to call webhook:
Post "[https://eventing-webhook.knative-eventing.svc:443/sinkbindings?timeout=10s](https://eventing-webhook.knative-eventing.svc/sinkbindings?timeout=10s)":
no endpoints available for service "eventing-webhook"
The test scales the eventing-webhook down to 0 and back to 3. But scaling back to 3 fails.
The new tests from the pull request deploy a container source that brings SinkBinding that causes the issue. Note: I have updated the issue description and title.
My experiments show that adding the label bindings.knative.dev/exclude=true
to the knative-eventing namespace fixes the issues. But this is rather a workaround because we already put this label on individual objects.
In the end, to exclude some objects, the user has to put the label bindings.knative.dev/exclude=true
on both the namespace and individual objects, otherwise: either namespace selector or object selector from the webhook will add them back and include them.
The solution would be probably to remove the namespaceSelector
from the MutatingWebhookConfiguration sinkbindings.webhook.sources.knative.dev
/triage accepted /cc @pierDipi @creydr
I think we can add this to the webhook configuration and it will be preserved [1]:
namespaceSelector:
- key: kubernetes.io/metadata.name
operator: NotIn
values:
- "knative-eventing"
I think we can add this to the webhook configuration and it will be preserved [1]:
In this case, would it be possible to exclude individual objects in other namespaces through objectSelector like in this commit ? By using the example above all the other namespaces would be already included so excluding individual objects would not be possible, IMO. And I guess excluding whole namespace (if the user wants it) would not be possible either.
We could probably flip the condition to use operator: In
or use just objectSelector for exclusion.
exclusion case:
namespaceSelector:
- key: kubernetes.io/metadata.name
operator: NotIn
values:
- "knative-eventing"
- key: bindings.knative.dev/exclude
operator: NotIn
values:
- "true"
matchConditions:
- expression: !has(request.object.metadata.labels) || !("bindings.knative.dev/exclude" in request.object.metadata.labels) || request.object.metadata.labels["bindings.knative.dev/exclude"] != "true"
name: expression
that will only pass when:
given that from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#mutatingwebhook-v1-admissionregistration-k8s-io
MatchConditions is a list of conditions that must be met for a request to be sent to this webhook. Match conditions filter requests that have already been matched by the rules, namespaceSelector, and objectSelector
inclusion case:
namespaceSelector:
- key: kubernetes.io/metadata.name
operator: NotIn
values:
- "knative-eventing"
- key: bindings.knative.dev/include
operator: In
values:
- "true"
objectSelector:
matchExpressions:
- key: bindings.knative.dev/include
operator: In
values:
- "true"
that will only pass when:
Describe the bug Running downgrade from Eventing 1.16 to 1.15 can fail with "no endpoints available for service "eventing-webhook" when ContainerSource (and thus SinkBinding) is present in the cluster.
The upgrade tests scale the eventing-webhook to zero and back to 3. But scaling back to 3 fails with this error:
There's a
MutatingWebhookConfiguration
named sinkbindings.webhook.sources.knative.dev which is run by eventing-webhook itself:As a result, the eventing-webhook is not started again and remains scaled to zero.
It happens on this PR which extends upgrade/downgrade tests in a specific way. Some resources are created before upgrade and verified after upgrade, and some resources are created after upgrade and verified later after downgrade. The tests now include ContainerSource which creates SinkBindings.
Example failure is in this run
However, the eventing-webhook Deployment already has the label
bindings.knative.dev/exclude: "true"
which should exclude it from the webhook selection:Expected behavior The eventing-webhook is up-and-running after downgrade.
To Reproduce Having ContainerSource and doing downgrade from 1.16 to 1.15. This PR reproduces the behavior reliably.
Knative release version Downgrading from pre-release 1.16 to 1.15
Additional context