koordinator-sh / koordinator

A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, etc.
https://koordinator.sh
Apache License 2.0
1.33k stars 328 forks source link

[proposal] Support Prod resource overcommitment #1428

Open saintube opened 1 year ago

saintube commented 1 year ago

After the work on the Mid-tier resource #1361 and node peak prediction #1385, the koordinator is able to estimate the future peak usage of the node and the pods in different priority classes. In some scenarios, we hope to submit more pods than the node allocatable where all the pods can be long-running and might have the same priority class. It requires a mechanism for supporting the over-commitment of Prod resources with the capability of peak prediction.

There are some works to support the Prod overcommitment:

saintube commented 1 year ago

/area koordlet /area koord-manager /area koord-scheduler

zwzhang0107 commented 1 year ago

Be careful when decrease the allocatable field. If requested > allocatable, pods will be evicted when kubelet restarts. Maybe using descheduler.