koordinator-sh / koordinator

A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, etc.
https://koordinator.sh
Apache License 2.0
1.31k stars 326 forks source link

[proposal] Report node allocatable RDT ctrl groups #2040

Open saintube opened 4 months ago

saintube commented 4 months ago

What is your proposal:

The koordlet needs to report the node allocatable of the Resctrl/RDT ctrl groups, so the koordlet modules and other components can be aware of the resctrl capacity and achieve guaranteed allocation.

Why is this needed:

756, #1798.

The number of the machine's Resctrl/RDT groups is limited and normally much less than the pod number of a node. To ensure the resctrl groups are available to set up for pods, the node allocatable of the resctrl groups should be collected and reported. Furthermore, we can schedule guaranteed resctrl groups for pods with capacity awareness and cooperate with the QoS strategy which may reserve node-level groups.

Is there a suggested solution, if so, please add it:

songtao98 commented 2 months ago

/assign @kangclzjc