Open ZiMengSheng opened 3 months ago
/area koord-scheduler
ref #2187 GPU & RDMA Joint Allocation
ref #2171 GPU 监控无法感知中心调度结果
ref #583 GPU 共享隔离方案
/area koordlet /area koord-manager
I am the maintainer of HAMi and I look forward to in-depth cooperation with koordinator in the area of device management.
I am the maintainer of HAMi and I look forward to in-depth cooperation with koordinator in the area of device management.
Hello, I‘m the issue planner, nice to meet you in github! I have some questions abount HAMI:
What is your proposal:
Provide an evolvable End to End Solution for Koordinator Device Management
Why is this needed:
Koordinator already supports two functions in the scheduler: GPU shared scheduling and GPU & RDMA joint allocation. It supports users to apply for GPU or RDMA resources using kubrenetes extended resources and Hints defined on Pod Annotation. The extended resource method was originally introduced into Kubernetes mainly to describe discrete and countable node resources. The Kubelet Device Plugin interface is the main way for the Kubernetes community to support such resource reporting and allocation.
However, the allocation logic of Kubelet Device Manager does not support the refined joint allocation of multiple resources according to the device topology, such as the scenario where GPU and RDMA need to be allocated under a PCIESwitch. The only topology allocation supported by Kubelet is allocation according to NUMA. However, even in the scenario where only NUMA allocation is required, Kubelet intervenes a little late. Users will have to face performance degradation due to topology mismatch after pod has been scheduled.
To solve this problem, Koordinator moved the device allocation logic from Kubelet to the scheduler, and used cri-runtime-proxy on the node side to set up device isolation and visibility. However, the cri-runtime-proxy approach is indeed heavy and inconvenient to install. In addition, although the Koordinator scheduler provides the GPU and RDMA joint allocation function, there is no end-to-end solution available overall, especially on the node side, it has not yet been connected to the community standard RDMA logic. This proposal attempts to solve the above problems for Koordinator and provide an end-to-end feasible solution.
Finally, in the field of device management, the community proposed Dynamic Resource Allocation after the Device Plugin interface to overcome the various limitations of the current Device Plugin solution. This proposal will also show how Koordintor's GPU sharing and GPU & RDMA joint allocation are implemented under the DRA mode, and how the current solution evolves to DRA.
Key Results: