Various performance and stability enhancements in CPUSetter

Levovar commented 3 years ago

Changing CPUSetter to be multi-threaded. This is needed to be able to keep up with the barrage of Pod activity happening inside bigger systems, and should solve the intermittent issue of containers sometimes not getting their cpuset provisioned in time. Also bumping K8s dependencies to 19.12. Lowering cache resycnh value from 30 seconds to 1 second. Adding an error watch handler to re-initialize worker connections in case of an abrupt failure.

Levovar commented 3 years ago

So it turns out the performance issues were mainly caused by the ONSLAUGHT of UPDATE operations simply flooding the Controller. The thing is, we only handle UPDATEs to recognize when a container was restarted. Problem is there are at least 5-10 UPDATE operations arriving to every setter, on all nodes, per Pod, which is just not sustainable regardless how many worker threads we start.

In response to this the architecture was changed to handle only CREATE operations, and instead of relying on continous UPDATEs to notify us about state changes, we pro-actively pull the changes for Pods we know we need to handle. This change in approach, coupled with the introduction of parallel worker threads eliminated all contention, and guarantees new Pods are always handled.

It also means that this architecture change temporarily breaks re-provisioning the cpuset of containers belonging to already adjusted Pods. We will re-introduce this functionality in the next PR, by implementing an independent reconciliation loop scouring the cpuset cgroup for un-adjusted leaves.

Levovar commented 3 years ago

Tested with continously redeploying 12 Pods at the same time, each Pod having 2 containers, one asking for exclusive, one asking for shared:

    spec:
      securityContext:
        runAsUser: 1000
      containers:
      - name: cpu-pooling1
        image: {{ .Values.registry_path }}/alpine_test:latest
        imagePullPolicy: IfNotPresent
        command: ["/usr/bin/dumb-init", "-c", "--"]
        args: ["sleep", "6000"]
        resources:
          requests:
            nokia.k8s.io/exclusive_numa_0_pool: {{ .Values.exclusive_pool_req }}
          limits:
            nokia.k8s.io/exclusive_numa_0_pool: {{ .Values.exclusive_pool_req }}
      - name: cpu-pooling2
        image: {{ .Values.registry_path }}/alpine_test:latest
        command: ["/bin/sh", "-c", "--"]
        args: ["while true; do echo \"Test\"; sleep 1; done;"]
        resources:
          requests:
            nokia.k8s.io/shared_pool: {{ .Values.shared_pool_req }}
          limits:
            nokia.k8s.io/shared_pool: {{ .Values.shared_pool_req }}

Zero restarts observed throughout any of the deployments:

# kubectl get po
NAME                             READY   STATUS    RESTARTS   AGE
cpu-pooling-5-75c8794974-bn747   2/2     Running   0          8m21s
cpu-pooling-5-75c8794974-cwn42   2/2     Running   0          8m21s
cpu-pooling-5-75c8794974-p656r   2/2     Running   0          8m21s
cpu-pooling-5-75c8794974-rw6rk   2/2     Running   0          8m21s
cpu-pooling-5-75c8794974-xhf7b   2/2     Running   0          8m21s
cpu-pooling-5-75c8794974-xrtr6   2/2     Running   0          8m21s
cpu-pooling-6-6c5b4547bc-6qjxn   2/2     Running   0          8m21s
cpu-pooling-6-6c5b4547bc-6vglj   2/2     Running   0          8m21s
cpu-pooling-6-6c5b4547bc-csljn   2/2     Running   0          8m21s
cpu-pooling-6-6c5b4547bc-f8cvx   2/2     Running   0          8m21s
cpu-pooling-6-6c5b4547bc-qfjxg   2/2     Running   0          8m21s
cpu-pooling-6-6c5b4547bc-tqls9   2/2     Running   0          8m21s

So the refactor part is complete, on to the reconciliation loop

nokia / CPU-Pooler

Various performance and stability enhancements in CPUSetter #70