All the daemonset are added to the queue and go to this code path.
Because the node local-volume-provisioner is not found.
With this issue over large clusters, the queue depth of nidhogg stays too high and the controller, at 100%, could take up to 20 minutes to remove the taints while the required daemonset pods are effectively running.
# TYPE workqueue_depth gauge
workqueue_depth{name="node-controller"} 1929
The is the instrumented nidhogg profile:
This is the profile with this PR as patch and 3 concurrent reconcilers (default to 1):
Nodes
Create:
From what I observed, it's interesting to process new Create node to add immediately the taint
Pods
Create:
Daemonsets managed by the kube-controller-manager has their .spec.nodeName already set when created.
Update/Delete:
Any transition of phase would be interesting to catch there.
The current enqueue strategy of daemonset pods should add the
.spec.nodeName
instead of the controllernamespace/name
itself.For example, with the following daemonset configuration:
All the daemonset are added to the queue and go to this code path. Because the node
local-volume-provisioner
is not found.With this issue over large clusters, the queue depth of nidhogg stays too high and the controller, at 100%, could take up to 20 minutes to remove the taints while the required daemonset pods are effectively running.
The is the instrumented nidhogg profile:
This is the profile with this PR as patch and 3 concurrent reconcilers (default to 1):
Nodes
Create: From what I observed, it's interesting to process new
Create
node to add immediately the taintPods
Create: Daemonsets managed by the
kube-controller-manager
has their.spec.nodeName
already set when created.Update/Delete: Any transition of phase would be interesting to catch there.