Closed MondayCha closed 1 week ago
Welcome @MondayCha!
It looks like this is your first PR to volcano-sh/devices.
Thank you, and welcome to Volcano. :smiley:
Hi,please add more description about this pr,and use git commit -s
to sign off your commit.
Thanks for your contribution. I opened a issue #69 for this pr.
@MondayCha Would you like to add a doc to guide how to configure and use it ?
/ok-to-test
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: william-wang
The full list of commands accepted by this bot can be found here.
The pull request process is described here
/lgtm
Motivation
Volcano v1.9.0 introduces Capacity scheduling capabilities, which makes it possible to configure different quotas for different types of GPU queues (important in production environments). For example:
However, the default Nvidia Device Plugin reports resources as
nvidia.com/gpu
, which does not support reporting different GPU models as shown in the example.To address this, we need to customize the device plugin.
Change Details
The NVIDIA community has already had discussions about this issue:
This PR is modified based on the above discussion.
Further Impact
GPU resource renaming will prevent the DCGM Exporter from obtaining pod-level GPU resource usage monitoring, since the DCGM Exporter must exactly match the resource name
nvidia.com/gpu
or those with a prefix ofnvidia.com/mig-
.