What k8s version are you using (kubectl version)?:
v1.29
What did you expect to happen?:
Right now, the requests the vpa-admission-controller handles are not contextified. For example, for the handler for Pods, context.TODO() is used in few places where the admission-controller is making requests along the way:
Due to the usages of context.TODO(), when the caller (kube-apiserver) cancels the request (due to client side timeout), the admission-controller's Pod handler is not notified about this and continues to process the requests even when the request is cancelled client side.
We recently faced a VPA related outage (described in https://github.com/kubernetes/autoscaler/issues/6884) where the vpa-admission-controller was client-side throttled due to the low default kube-api-qps/burst settings.
From the logs we see that it was throttled > 50 minutes:
{"log":"Waited for 51m21.05416376s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver/apis/monitoring.coreos.com/v1/namespaces/foo/prometheuses/bar/scale","pid":"1","severity":"INFO","source":"request.go:697"}
{"log":"Waited for 51m21.024486679s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver/apis/monitoring.coreos.com/v1/namespaces/foo/prometheuses/bar/scale","pid":"1","severity":"INFO","source":"request.go:697"}
{"log":"Waited for 51m20.527328217s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver/apis/monitoring.coreos.com/v1/namespaces/foo/prometheuses/bar/scale","pid":"1","severity":"INFO","source":"request.go:697"}
{"log":"Waited for 51m19.975656855s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver/apis/monitoring.coreos.com/v1/namespaces/foo/prometheuses/bar/scale","pid":"1","severity":"INFO","source":"request.go:697"}
{"log":"Waited for 51m19.466347921s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver/apis/monitoring.coreos.com/v1/namespaces/foo/prometheuses/bar/scale","pid":"1","severity":"INFO","source":"request.go:697"}
{"log":"Waited for 51m18.572692764s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver/apis/monitoring.coreos.com/v1/namespaces/foo/prometheuses/bar/scale","pid":"1","severity":"INFO","source":"request.go:697"}
Hence, the vpa-admission-controller currently would wait the client-side throttling (> 50min) instead of canceling the request.
Meanwhile the kube-apiserver cancelled the request after the configured timeout in the webhook (10s in our case):
E0604 13:21:25.379831 1 dispatcher.go:214] failed calling webhook "vpa.k8s.io": failed to call webhook: Post "https://vpa-webhook:443/?timeout=10s": context deadline exceeded
What happened instead?:
See above.
How to reproduce it (as minimally and precisely as possible):
Add a big sleep (higher than the kube-apiserver's timeout) in the Pod handler and make sure that the admission request continues to do things after kube-apiserver cancelled the request client-side.
Which component are you using?:
vertical-pod-autoscaler
What version of the component are you using?:
Component version: 1.1.2
What k8s version are you using (
kubectl version
)?:v1.29
What did you expect to happen?: Right now, the requests the vpa-admission-controller handles are not contextified. For example, for the handler for Pods,
context.TODO()
is used in few places where the admission-controller is making requests along the way:Due to the usages of context.TODO(), when the caller (kube-apiserver) cancels the request (due to client side timeout), the admission-controller's Pod handler is not notified about this and continues to process the requests even when the request is cancelled client side.
We recently faced a VPA related outage (described in https://github.com/kubernetes/autoscaler/issues/6884) where the vpa-admission-controller was client-side throttled due to the low default kube-api-qps/burst settings.
From the logs we see that it was throttled > 50 minutes:
Hence, the vpa-admission-controller currently would wait the client-side throttling (> 50min) instead of canceling the request.
Meanwhile the kube-apiserver cancelled the request after the configured timeout in the webhook (10s in our case):
What happened instead?:
See above.
How to reproduce it (as minimally and precisely as possible):
Add a big sleep (higher than the kube-apiserver's timeout) in the Pod handler and make sure that the admission request continues to do things after kube-apiserver cancelled the request client-side.
Anything else we need to know?:
N/A