nokia / CPU-Pooler

A Device Plugin for Kubernetes, which exposes the CPU cores as consumable Devices to the Kubernetes scheduler.
BSD 3-Clause "New" or "Revised" License
92 stars 22 forks source link

Container restarted due to Cgroup cpuset does not match to expected cpuset #44

Closed YanXiaoping2 closed 4 years ago

YanXiaoping2 commented 4 years ago

Hi,

Container is restared once it is deployed and then it can be up and running. Seems similar as the close issue https://github.com/nokia/CPU-Pooler/issues/36.

NCIR version as below(don't know how to find CPU pooler version, please tell if it's needed.) [cranuser6@controller-3 ~]$ cat /etc/ncir-release release=NCI_R19-29 build=5968.g2838975

[cranuser6@controller-3 rcp-pod-up]$ kubectl logs -p yxp-up-deployment-0 -c l2hi-container Used CPU Pool(s): exclusive&shared Cgroup cpuset (0-39) expected cpuset (4-6,24-26,28,37-38) Cgroup cpuset (0-39) expected cpuset (4-6,24-26,28,37-38) Cgroup cpuset (0-39) expected cpuset (4-6,24-26,28,37-38) Cgroup cpuset (0-39) expected cpuset (4-6,24-26,28,37-38) Cgroup cpuset (0-39) expected cpuset (4-6,24-26,28,37-38) Cgroup cpuset (0-39) expected cpuset (4-6,24-26,28,37-38) Cgroup cpuset (0-39) expected cpuset (4-6,24-26,28,37-38) Cgroup cpuset (0-39) expected cpuset (4-6,24-26,28,37-38) Cgroup cpuset (0-39) expected cpuset (4-6,24-26,28,37-38) Cgroup cpuset (0-39) expected cpuset (4-6,24-26,28,37-38) Cgroup cpuset (0-39) does not match to expected cpuset (4-6,24-26,28,37-38)

[cranuser6@controller-3 rcp-pod-up]$ kubectl describe pod yxp-up-deployment-0 ... l2hi-container: Container ID: docker://ca0ab76e4152b445ba0678723d60ae329246a5dccce74528428fc889ccf2d281 Image: rcp-docker-containers-local.esisoj70.emea.nsn-net.net/ccs-rt/ccs-rt-dpm-trs:2.27.0 Image ID: docker-pullable://rcp-docker-containers-local.esisoj70.emea.nsn-net.net/ccs-rt/ccs-rt-dpm-trs@sha256:58a980c3e856a83f31d2cb077194a79bc6e17c846b4b9eed2f0968697ab247b6 Port: Host Port: Command: /opt/bin/process-starter Args: /bin/bash -c ministarter State: Running Started: Wed, 08 Jul 2020 09:25:50 +0800 Last State: Terminated Reason: Error Exit Code: 1 Started: Wed, 08 Jul 2020 09:25:34 +0800 Finished: Wed, 08 Jul 2020 09:25:44 +0800 Ready: True Restart Count: 1 Limits: cpu: 0 hugepages-1Gi: 5Gi memory: 1224Mi nokia.k8s.io/exclusive_caas: 3 nokia.k8s.io/shared_caas: 200 nokia.k8s.io/sriov_vfio_ens1f1: 2 Requests: cpu: 0 hugepages-1Gi: 5Gi memory: 1224Mi nokia.k8s.io/exclusive_caas: 3 nokia.k8s.io/shared_caas: 200 nokia.k8s.io/sriov_vfio_ens1f1: 2

Levovar commented 4 years ago

@YanXiaoping2 this issue should be solved by https://github.com/nokia/CPU-Pooler/pull/43 NCIR does not include the latest Pooler version, so you should manually upgrade the component on your environment from the latest master, and retest the scenario

YanXiaoping2 commented 4 years ago

Ok, thank you, I will ask support from host team. You can close this ticket as it will take time to get upgrage from host and verify. We can reopen or report new issue if problem still happen after upgrade.