koordinator-sh / koordinator

A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, etc.
https://koordinator.sh
Apache License 2.0
1.3k stars 321 forks source link

[BUG] #1218

Closed chaikebin closed 1 year ago

chaikebin commented 1 year ago

What happened: 我启动了一个BE Pod,CPU Request设置了1,内存设置了11G,未设置CPU Limit和内存Limit,POD里启动了stress,POD运行后,刚开始还比较正常,大概20秒之后,就一直拿不到CPU资源了。(这台机器利用率很低,没有LS,LSE,LSR等POD运行)

节点cpu利用率<1%,96核,按k8s原生的besteffort运行没有问题,运行stress的POD的CPU一直100%

What you expected to happen: BE Pod应该能够获取CPU 100%资源

Anything else we need to know?: Cgroup V2

Environment:

saintube commented 1 year ago

/area koordlet

@chaikebin This bug should be fixed after #1222. It does not happen when a Batch pod sets the limits of batch resources. Please check the PR if you have any question.