nokia / CPU-Pooler

A Device Plugin for Kubernetes, which exposes the CPU cores as consumable Devices to the Kubernetes scheduler.
BSD 3-Clause "New" or "Revised" License
92 stars 22 forks source link

Cpu affinity conflicts with cgroups configuration when create pod with exclusive-pool #64

Closed yjy5921591ok closed 3 years ago

yjy5921591ok commented 3 years ago

Describe the bug Cpu affinity conflicts with cgroups configuration when create pod with exclusive-pool

To Reproduce Steps to reproduce the behavior: 1.Download the compressed package of the master branch code and copy it to the k8s environment 2.Compile and install the components of cpu-pooler according to official instructions 3.For the convenience of installation, I put the generated bin file, mirror image and dependent yaml file in the same folder, and wrote a shell script. The script content is as follows:

#!/bin/bash
kubectl label nodes --all kube-cpu-pool=enabled
mkdir -p /opt/bin
chmod a+x app/cpu-pooler/process-starter
cp app/cpu-pooler/process-starter /opt/bin
kubectl create -f app/cpu-pooler/my-cpu-pooler-config.yaml
kubectl create -f app/cpu-pooler/cpu-dev-ds.yaml
kubectl create -f app/cpu-pooler/cpusetter-ds.yaml
sh app/cpu-pooler/generate-cert.sh
kubectl create -f app/cpu-pooler/webhook-svc-depl.yaml
sh app/cpu-pooler/create-webhook-conf.sh app/cpu-pooler/webhook-conf.yaml

4.The custom configmap is as follows:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cpu-pooler-configmap
  namespace: kube-system
data:
  poolconfig-controller.yaml: |
    pools:
      exclusive-pool:
        cpus : "1"
        hyperThreadingPolicy: singleThreaded
      exclusive-numa-0:
        cpus : "2,3"
        hyperThreadingPolicy: multiThreaded
      shared-pool:
        cpus : "4,5"
      default:
        cpus: "0,6,7"
    nodeSelector:
      kube-cpu-pool: enabled

5.So far, all pods related to cpu-pooler seem to be running normally:

kube-system          cpu-dev-pod-mutator-deployment-644d8d6586-xv8n7                1/1     Running
kube-system          cpu-device-plugin-9nllh                                        1/1     Running
kube-system          cpu-device-plugin-fm9dt                                        1/1     Running
kube-system          cpu-setter-sq7lz                                               1/1     Running
kube-system          cpu-setter-x2pp7                                               1/1     Running

The allocation of resources seems to be successful:

Allocatable:
  cpu:                            7800m
  ephemeral-storage:              48294789041
  hugepages-1Gi:                  0
  hugepages-2Mi:                  0
  memory:                         15638700Ki
  nokia.k8s.io/exclusive-numa-0:  2
  nokia.k8s.io/exclusive-pool:    1
  nokia.k8s.io/shared-pool:       2k
  pods:                           110

6.Then I created a pod based on the test file given in the source package and adjusted it for my own configmap:

apiVersion: v1
kind: Pod
metadata:
  name: cpupod
  annotations:
    nokia.k8s.io/cpus: |
      [{
      "container": "exclusivetestcontainer",
      "processes":
        [{
           "process": "/bin/sh",
           "args": ["-c","/thread_busyloop -n \"Process \"1"],
           "cpus": 1,
           "pool": "exclusive-pool"
         }
      ]
      }]
spec:
  containers:
  - name: sharedtestcontainer
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 1; done;" ]
    image: busyloop
    imagePullPolicy: IfNotPresent
    ports:
    - containerPort: 80
    resources:
      requests:
        memory: 500Mi
        nokia.k8s.io/shared-pool: "160"
      limits:
        nokia.k8s.io/shared-pool: "160"
        memory: 500Mi
  - name: exclusivetestcontainer
    image: busyloop
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 1; done;" ]
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        memory: 500Mi
        nokia.k8s.io/exclusive-pool: "1"
      limits:
        memory: 500Mi
        nokia.k8s.io/exclusive-pool: "1"
  - name: defaulttestcontainer
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 1; done;" ]
    image: busyloop
    imagePullPolicy: IfNotPresent
    ports:
    - containerPort: 80

7.This pod contains 3 containers, the exclusivetest container will always try to restart, and the other two containers are normal. The error of the exclusivetest container is as follows:

error: a container name must be specified for pod cpupod, choose one of: [sharedtestcontainer exclusivetestcontainer defaulttestcontainer]
[root@k8s1 home]# kubectl logs cpupod -c exclusivetestcontainer
Used CPU Pool(s):  exclusive
Cgroup cpuset (0-7) expected cpuset (1)
Cgroup cpuset (0-7) expected cpuset (1)
Cgroup cpuset (0-7) expected cpuset (1)
Cgroup cpuset (0-7) expected cpuset (1)
Cgroup cpuset (0-7) expected cpuset (1)
Cgroup cpuset (0-7) expected cpuset (1)
Cgroup cpuset (0-7) expected cpuset (1)
Cgroup cpuset (0-7) expected cpuset (1)
Cgroup cpuset (0-7) expected cpuset (1)
Cgroup cpuset (0-7) expected cpuset (1)
Cgroup cpuset (0-7) does not match to expected cpuset (1)

8.Resources should be allocated correctly:

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                       Requests              Limits
  --------                       --------              ------
  cpu                            1580m (20%)           4960m (63%)
  memory                         3733004288500m (23%)  5738833152 (35%)
  ephemeral-storage              0 (0%)                0 (0%)
  hugepages-1Gi                  0 (0%)                0 (0%)
  hugepages-2Mi                  0 (0%)                0 (0%)
  nokia.k8s.io/exclusive-numa-0  0                     0
  nokia.k8s.io/exclusive-pool    1                     1
  nokia.k8s.io/shared-pool       160                   160
Events:                          <none>

9.But the cgroups configuration seems to be incorrect: vi /sys/fs/cgroup/cpuset/kubepods/cpuset.cpus(The result of this command is null) vi /sys/fs/cgroup/cpuset/cpuset.cpus(The result of this command is 0-7)

Additional context This project greatly facilitates cpu binding operations, and I want to use this project to improve my platform. Your guidance is very important to me. Thank you for your contribution.

yjy5921591ok commented 3 years ago

The error log is printed as follows:

2021/04/25 14:09:15 ERROR: Cpuset for the containers of Pod: cpupod with ID: f9c0efcc-cd48-4431-bf85-bb0959aa5e26 could not be re-adjusted, because: cpuset file does not exist for container: ace2008b27c43b31933ab995a3677a005417bf3efcd27788d935405dd1af9838 under the provided cgroupfs hierarchy: /sys/fs/cgroup/cpuset/kubepods
2021/04/25 14:09:15 error unmarshalling kubelet checkpoint file: json: cannot unmarshal object into Go struct field checkpointPodDevicesEntry.Data.PodDeviceEntries.DeviceIDs of type []string
2021/04/25 14:09:15 ERROR: Cpuset for the containers of Pod: cpupod ID: f9c0efcc-cd48-4431-bf85-bb0959aa5e26 could not be re-adjusted, because: json: cannot unmarshal object into Go struct field checkpointPodDevicesEntry.Data.PodDeviceEntries.DeviceIDs of type []string
2021/04/25 14:09:15 ERROR: Cpuset for the containers of Pod: cpupod with ID: f9c0efcc-cd48-4431-bf85-bb0959aa5e26 could not be re-adjusted, because: cpuset file does not exist for container: 5c60ffa4a3b8f86c27bd9dab6972ea35c236adf05261388abaf82dc1e7ff2b01 under the provided cgroupfs hierarchy: /sys/fs/cgroup/cpuset/kubepods
Levovar commented 3 years ago

@yjy5921591ok : can you try what happens if you ask for exclusive cores from exclusive-numa-0 pool instead?

based on the "error unmarshalling kubelet checkpoint file: json: cannot unmarshal object into Go struct field checkpointPodDevicesEntry.Data.PodDeviceEntries.DeviceIDs of type []string" error I'm thinking the problem might be with resource pools which have only one entry. it can be K8s doesn't consistently marshal this information into JSON when the DeviceID map has only one entry

yjy5921591ok commented 3 years ago

The current use of numa-0 also has similar problems.I have two errors here, the unmarshall error, and the other cgroup error. Are these two errors due to the same reason?In addition to the master branch, 0.3.0 also has this problem. My current version of k8s is 1.21. How can I avoid this problem? Do I still need to adjust the k8s configuration, or what other measures need to be implemented?First of all I want to make cpu-pooler run normally,may you guys can give me a guide.Thanks

Levovar commented 3 years ago

yes the two can be related, but might not be. note, we are currently refactoring the CPU setting part (https://github.com/nokia/CPU-Pooler/pull/62) because in some cases it can happen the events from a Pod coming up do not reach us in time, so by the time we start changing its cpuset the container already disappeared with the latest master this should be only an intermittent error, and even when it happens it should self-correct itself. usually latest after 2-3 automated restarts the container comes up on its own

if your failure is permanent, it prob has a different root cause, and that might be the unmarshalling error. I have never seen this one personally, but also we currently use 1.20, not 1.21. the syntax might have changed? can you share (or upload somewhere) the "/var/lib/kubelet/device-plugins/kubelet_internal_checkpoint" file from the node you observe the failure?

yjy5921591ok commented 3 years ago

Thank you very much for your prompt response, this error has been resolved. It is because my local cgroup path is different from the path mounted in cpusetter. This path is automatically generated when k8s is deployed, and may be a version difference. Anyway, thank you for your answers. I will continue to pay attention to this project.