kubewharf / katalyst-core

Katalyst aims to provide a universal solution to help improve resource utilization and optimize the overall costs in the cloud. This is the core components in Katalyst system, including multiple agents and centralized components
Apache License 2.0
433 stars 107 forks source link

:bug: when use systemd cgrouptype, got error " failed to find relative path of suffix" #714

Open kingeasternsun opened 3 weeks ago

kingeasternsun commented 3 weeks ago

What happened?

When we start katalyst-agent by adding customArgs with cgroup-type=systemd , We got error like this ···· E1106 05:11:08.271848 1 client_pod.go:62] [katalyst-core/pkg/metaserver/agent/metric/provisioner/malachite/client.(*MalachiteClient).GetPodStats] GetPodStats err GetPodContainerStats 2e52c138-d3b0-4886-ab57-6b1a22f4b276/80df6a1f0ce4e8d6eeb4b048f14789c6bccc7b93660d74b3e899a6387be53a2b get-relative-path err failed to find relative path of suffix: pod2e52c138-d3b0-4886-ab57-6b1a22f4b276/80df6a1f0ce4e8d6eeb4b048f14789c6bccc7b93660d74b3e899a6387be53a2b, error: E1106 05:11:11.271212 1 manager_linux.go:140] [cgroupIDManagerImpl.addAbsentCgroupIDsToCache] get cgroup id failed, pod: 2e52c138-d3b0-4886-ab57-6b1a22f4b276, container: 80df6a1f0ce4e8d6eeb4b048f14789c6bccc7b93660d74b3e899a6387be53a2b, err: GetContainerAbsCgroupPath failed, err: failed to find absolute path of suffix: pod2e52c138-d3b0-4886-ab57-6b1a22f4b276/80df6a1f0ce4e8d6eeb4b048f14789c6bccc7b93660d74b3e899a6387be53a2b, error: ···

But in our node, the cpu cgroup of this pod is

20241106-205615

What did you expect to happen?

GetContainerAbsCgroupPath and GetPodContainerStats should find the right file of pod.

How can we reproduce it (as minimally and precisely as possible)?

use systemd cgroup-type

Software version

v5.0

$ <software> version
# paste output here