intel / helm-charts

Apache License 2.0
12 stars 15 forks source link

Bug: device-plugin-operator expects crd sgx.mutator.webhooks.intel.com #46

Closed buroa closed 8 months ago

buroa commented 8 months ago

I was going through some logs and noticed there is a lot of spam from this specifically: https://github.com/intel/helm-charts/blob/main/charts/device-plugin-operator/templates/operator.yaml#L696

Failed calling webhook "sgx.mutator.webhooks.intel.com": failed to call webhook: the server could not find the requested resource

It looks like installing https://github.com/intel/helm-charts/tree/main/charts/device-plugin-operator expects this mutating webhook CRD to be there, though it's not.

buroa commented 8 months ago

Potentially related: https://github.com/intel/intel-device-plugins-for-kubernetes/issues/539

mythi commented 8 months ago

@buroa thanks for the report.

Can you add here the version of the chart(s) you're using and how they were installed. Which plugin charts did you deploy?

buroa commented 8 months ago

@buroa thanks for the report.

Can you add here the version of the chart(s) you're using and how they were installed. Which plugin charts did you deploy?

Hey @mythi,

0.28.0 and it's installed with Flux via a HelmRelease spec. See here: https://github.com/buroa/k8s-gitops/blob/master/kubernetes/apps/kube-system/intel-device-plugin/app/helmrelease.yaml

The plugin I'm installing is intel-device-plugins-gpu, also version 0.28.0. Same way: https://github.com/buroa/k8s-gitops/blob/master/kubernetes/apps/kube-system/intel-device-plugin/gpu/helmrelease.yaml

mythi commented 8 months ago

That webhook is "built-in" to that operator so the errors may show up during the time the webhook isn't correctly serving. Do the errors keep repeating in the logs or was it was a startup-time occurrence?

buroa commented 8 months ago

That webhook is "built-in" to that operator so the errors may show up during the time the webhook isn't correctly serving. Do the errors keep repeating in the logs or was it was a startup-time occurrence?

It's spamming the kube-apiserver logs.

mythi commented 8 months ago

I'm not able to reproduce in my non-Helm installation. Is it possible you have multiple MutatingWebhookConfigurations for this. See, kubectl get MutatingWebhookConfiguration -A.

My setup shows:

kubectl get MutatingWebhookConfiguration inteldeviceplugins-mutating-webhook-configuration
NAME                                                WEBHOOKS   AGE
inteldeviceplugins-mutating-webhook-configuration   9          6h27m
onedr0p commented 8 months ago

I'm having the same issue as @buroa and I also have this installed via Helm.

❯ kubectl get MutatingWebhookConfiguration -n tools
NAME                                                WEBHOOKS   AGE
inteldeviceplugins-mutating-webhook-configuration   9          2d11h
devin@k8s-0 ~> sudo journalctl -u k3s -r | grep sgx
Dec 18 19:54:55 k8s-0 k3s[6518]: E1218 19:54:55.031563    6518 dispatcher.go:214] failed calling webhook "sgx.mutator.webhooks.intel.com": failed to call webhook: the server could not find the requested resource
Dec 18 19:54:55 k8s-0 k3s[6518]: W1218 19:54:55.031380    6518 dispatcher.go:210] Failed calling webhook, failing open sgx.mutator.webhooks.intel.com: failed calling webhook "sgx.mutator.webhooks.intel.com": failed to call webhook: the server could not find the requested resource

It looks like this might be causing the issue?

https://github.com/intel/helm-charts/blob/main/charts/device-plugin-operator/templates/operator.yaml#L688C1-L708C20

Should this helm chart be updated to include only the configuration we need depending on what we choose?

mythi commented 8 months ago

Should this helm chart be updated to include only the configuration we need depending on what we choose?

we can look into it later but doing that is not the fix for this problem.

I'm able to reproduce it here when I install the chart so some other namespace than what we have used by default (inteldeviceplugins-system)...

onedr0p commented 8 months ago

That namespace name is a bit too wordy for my liking but glad you were able to reproduce 😆 Thanks!

mythi commented 8 months ago

My comment about the namespace was wrong. The issue is with the webhook path https://github.com/intel/helm-charts/blob/d06026544bb67724ad50c2b4b53eec52518813f5/charts/device-plugin-operator/templates/operator.yaml#L694 which should be /mutate--v1-pod. We will fix it as part of your 0.29 updates this week.

mythi commented 8 months ago

@buroa @onedr0p Does 0.29 work OK for you?

buroa commented 8 months ago

@buroa @onedr0p Does 0.29 work OK for you?

@mythi From what I can see on the kube-apiserver logs, the error is no longer there :)

onedr0p commented 8 months ago

@mythi looks fine on my end. Many thanks and happy holidays! ☃️