uswitch / nidhogg

Kubernetes Node taints based on Daemonset Pods
Apache License 2.0
76 stars 15 forks source link

controller: use a specific enqueue strategy #18

Closed JulienBalestra closed 4 years ago

JulienBalestra commented 4 years ago

The current enqueue strategy of daemonset pods should add the .spec.nodeName instead of the controller namespace/name itself.

For example, with the following daemonset configuration:

{
  "nodeSelector": {},
  "daemonsets": [
    {
      "name": "kube2iam",
      "namespace": "kube2iam"
    },
    {
      "name": "kube-proxy",
      "namespace": "kube-system"
    },
    {
      "name": "node-local-dns",
      "namespace": "coredns"
    },
    {
      "name": "local-volume-provisioner",
      "namespace": "local-volume-provisioner"
    }
  ]
}

All the daemonset are added to the queue and go to this code path. Because the node local-volume-provisioner is not found.

With this issue over large clusters, the queue depth of nidhogg stays too high and the controller, at 100%, could take up to 20 minutes to remove the taints while the required daemonset pods are effectively running.

# TYPE workqueue_depth gauge
workqueue_depth{name="node-controller"} 1929

The is the instrumented nidhogg profile: image

This is the profile with this PR as patch and 3 concurrent reconcilers (default to 1): image

Nodes

Create: From what I observed, it's interesting to process new Create node to add immediately the taint

Pods

Create: Daemonsets managed by the kube-controller-manager has their .spec.nodeName already set when created.

Update/Delete: Any transition of phase would be interesting to catch there.