nokia / CPU-Pooler

A Device Plugin for Kubernetes, which exposes the CPU cores as consumable Devices to the Kubernetes scheduler.
BSD 3-Clause "New" or "Revised" License
92 stars 22 forks source link

does not match to expected cpuset #68

Closed antzjm closed 3 years ago

antzjm commented 3 years ago

Describe the bug Pod can not get cpu core and restart.

Used CPU Pool(s): exclusive&shared Cgroup cpuset (0-39) expected cpuset (4,12,24,26) Cgroup cpuset (0-39) expected cpuset (4,12,24,26) Cgroup cpuset (0-39) expected cpuset (4,12,24,26) Cgroup cpuset (0-39) expected cpuset (4,12,24,26) Cgroup cpuset (0-39) expected cpuset (4,12,24,26) Cgroup cpuset (0-39) expected cpuset (4,12,24,26) Cgroup cpuset (0-39) expected cpuset (4,12,24,26) Cgroup cpuset (0-39) expected cpuset (4,12,24,26) Cgroup cpuset (0-39) expected cpuset (4,12,24,26) Cgroup cpuset (0-39) expected cpuset (4,12,24,26) Cgroup cpuset (0-39) does not match to expected cpuset (4,12,24,26)

To Reproduce Steps to reproduce the behavior: 1. 2. 3.

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.

Relevant SW info release=NCI_R19-33 build=6021.ga8897f5 cpu-pooler-0.3.1-15

antzjm commented 3 years ago

I have seen old issue about this phenomenon. But it still happens in latest cpu-pooler version.

antzjm commented 3 years ago

hello ,cpu pooler support team. This issue always happened. Can you help us to detect root cause of this problem. here is error log: I0621 19:41:50.486854 1 webhook.go:315] Patch container for pinning l2rt-container

Levovar commented 3 years ago

see https://github.com/nokia/CPU-Pooler/issues/64#issuecomment-828348095

yep it intermittently does, and will continue to do so until https://github.com/nokia/CPU-Pooler/pull/62 is concluded on

Levovar commented 3 years ago

70 aims to handle this issue, testing is still pending

Levovar commented 3 years ago

70 implements the required architectural change for guaranteed performance. no container restarts should be happening now during deployment

note that the request is not yet finished, as during the refactoring we needed to break the functionality ensuring restarted containers also get their desired cpuset back. feedback related to the initial deployment scenario with the new patch is appreciated though while I'm working on the next PR

Levovar commented 3 years ago

73 completes the re-factoring!

please update to latest commit, and re-try. issue should not be seen again