[proposal] Support Prod resource overcommitment

saintube commented 1 year ago

After the work on the Mid-tier resource #1361 and node peak prediction #1385, the koordinator is able to estimate the future peak usage of the node and the pods in different priority classes. In some scenarios, we hope to submit more pods than the node allocatable where all the pods can be long-running and might have the same priority class. It requires a mechanism for supporting the over-commitment of Prod resources with the capability of peak prediction.

There are some works to support the Prod overcommitment:

[ ] Define the API for Prod overcommitment.
[ ] Implement the resource overcommitment scaling mechanism.
[ ] Implement the resource calculation and updating in the koord-manager.
[ ] (optional) Enhance the scheduler with the peak prediction.
[ ] (optional) Improve the node prediction in koordlet with a more conservative estimation and less ledger lagging.

saintube commented 1 year ago

/area koordlet /area koord-manager /area koord-scheduler

zwzhang0107 commented 1 year ago

Be careful when decrease the allocatable field. If requested > allocatable, pods will be evicted when kubelet restarts. Maybe using descheduler.

koordinator-sh / koordinator

[proposal] Support Prod resource overcommitment #1428