Closed budimanjojo closed 3 months ago
Maybe this line is set too low https://github.com/kubernetes-sigs/node-feature-discovery/blob/560905fbee7bb8fe475831cc3b86f3a62d78d43e/deployment/helm/node-feature-discovery/values.yaml#L533
Or maybe there's a bug in the garbage collection logic making it taking too many resources.
Thanks @budimanjojo for reporting this. How big is your cluster (ca. how many nodes)?
In retrospect, setting the cpu limits might not have been that good idea. We might want to remove those (and cut a patch release) 🧐
The most immediate fix for you would probably be to remove the cpu limits, i.e. do Helm install with --set gc.resources.limits.cpu=null
Hi @marquiz! I have a 3 nodes cluster so it's a pretty small one.
Yeah I agree with having no CPU limits set at least in the gc
pod by default. Should I open a PR or I'll just wait?
I have a 3 nodes cluster so it's a pretty small one.
OK, not a huge one, then. 😅 Looks like we need to investigate that a bit further 🤔
Yeah I agree with having no CPU limits set at least in the
gc
pod by default. Should I open a PR or I'll just wait?
Please do, more contributors -> better 😊 Let's remove cpu limits for all daemons. Also, we need to update the tables of parameters in docs/deployment/helm.md
, accordingly (for the defaults)
@marquiz I just created the PR, please take a look. I removed CPU limits for all daemons instead of just the garbage collection pod according to your recommendation.
What happened: After updating to v0.16.0, I keep getting
CPUThrottlingHigh
alert on the garbage collection pod like this:What you expected to happen: Everything should be running like it used to be. I have fairly default helm values:
How to reproduce it (as minimally and precisely as possible): Use the latest v0.16.0 with the values above.
Anything else we need to know?:
Environment:
kubectl version
): v1.30.0cat /etc/os-release
): Talos Linuxuname -a
): 6.6.29-talos