openkruise / charts

OpenKruise Helm Charts.
Apache License 2.0
9 stars 24 forks source link

Dangerous MutatingWebhook for all "pod" resources that cannot be disabled in Helm values #90

Open vpedosyuk opened 5 months ago

vpedosyuk commented 5 months ago

I'm talking mainly about this part of Helm templates: https://github.com/openkruise/charts/blob/d1c325e49cedb38ea4793ad87d78dd4367b2eeaa/versions/kruise/1.5.2/templates/webhookconfiguration.yaml#L6-L31

It creates a webhook for any "CREATE" operation of the "pod" resource - this basically affects everything in a Kubernetes cluster. There can be various reasons for OpenKruise controller to be unavailable, thus, it might be very useful to have an option to make the webhook isolated only to resources in "*.kruise.io" API group.

Here's an easy example when I scaled "manager" to 0 replicas and trying to create a dummy pod:

$ kubectl run busybox --image=busybox
Error from server (InternalError): Internal error occurred: failed calling webhook "mpod.kb.io": failed to call webhook: Post "https://kruise-webhook-service.kruise-system.svc:443/mutate-pod?timeout=10s": no endpoints available for service "kruise-webhook-service"
furykerry commented 4 months ago

many openkruise features rely on the pod creation webhook ,such as sidecarset and workloadspread, if you does not rely on such features, you can disable the webhook by turning off the PodWebhook feature gates in the helm chart .

vpedosyuk commented 4 months ago

@furykerry yes, I can set featureGates: PodWebhook=false in the Helm chart but, unfortunately, it applies only to the Kruise controller application itself (through env vars) and it doesn't directly apply to the MutatingWebhookConfiguration and ValidatingWebhookConfiguration Kubernetes resources.

So I made a PR to fix that (and some other issues): https://github.com/openkruise/charts/pull/91

furykerry commented 3 months ago

@vpedosyuk thanks for your contribution, most change of your patch are included in the 1.6.0 chart along with other feature release: https://github.com/openkruise/charts/pull/95/files#diff-0c128c764f13a0883f1f188d84b75bdd0e4112d272d4e40efd84a54f5af56a43

The one change is missing however, that is the one to change failpolicy of pod-webhook, we think the webhook function is crucial for the feature that rely on webhook e.g. sidecarset, simply skipping the webhook can make kruise working incorrect which may introduce unexpected problem to the cluster. A more complete solution is required which may involve mark the pod that is actually managed by kruise with label, so we can leave the issue open for future implementation.