knative / operator

Combined operator for Knative.
Apache License 2.0
189 stars 100 forks source link

Maintenance precondition check failed because of failurePolicy "Fail" on web hooks #1862

Open berendt opened 2 months ago

berendt commented 2 months ago

We want to use knative on Kubernetes clusters managed by Gardener. There we have the following issue when working with the knative operator:

Maintenance precondition check failed. Gardener may be unable to perform required actions during maintenance: ValidatingWebhookConfiguration "config.webhook.istio.networking.internal.knative.dev" is problematic: webhook "config.webhook.istio.networking.internal.knative.dev" with failurePolicy "Fail" and 10s timeout might prevent worker nodes from properly joining the shoot cluster

This way we are not able to maintain or hibernate the cluster because of the failed precondition. It is possible to workaround this by manually change the failure policy on the following web hooks:

Namespace   Configuration-Type                              Configuration-Name
knative-serving ValidatingWebhookConfiguration  config.webhook.istio.networking.internal.knative.dev
knative-eventig ValidatingWebhookConfiguration  config.webhook.istio.networking.internal.knative.dev
knative-eventig ValidatingWebhookConfiguration  config.webhook.serving.knative.dev
knative-eventig MutatingWebhookConfiguration    webhook.istio.networking.internal.knative.dev

However, this is only temporary; the manual changes are of course overwritten again.

We have not yet found a way to customise this in the knative operator. We have no influence on the Gardener side as it is a managed service that we use for Kubernetes. Any ideas?

houshengbo commented 2 months ago

@berendt Do you suggest any changes to the existing ValidatingWebhookConfiguration and MutatingWebhookConfiguration?

houshengbo commented 2 months ago

It is true that knative operator cannot configure the failurePolicy of any existing ValidatingWebhookConfiguration and MutatingWebhookConfiguration in eventing and serving.

To overcome this, the only thought I have with operator is to use customized manifests, like with the append mode: https://knative.dev/docs/install/operator/configuring-serving-cr/#append-mode You can consolidate all the changes for ValidatingWebhookConfiguration and MutatingWebhookConfiguration, and put one for serving and one for eventing.

Either publish the file somewhere accessible to your kube cluster, so that CR picks it up as additional resources, overriding existing ones if necessary. Or leverage local volume to host the additional manifests like this: https://vincenthou.medium.com/how-to-customize-the-manifests-for-knative-operator-with-a-local-volume-c576b592d9d7

rhizoet commented 1 month ago

Many thanks for the tip. It is a bit inconvenient that you have to upload a file somewhere in order to integrate it, but it works. We have adapted both webhooks accordingly and integrated them dynamically from the URL.

No more errors and the webhooks are created directly with failurePolicy: Ignore.