Currently, when a pod requests a GPU card without a particular GPU memory request, the DeviceShare scheduling supposes all the GPUs on one node have the same memory capacity and randomly picks one as the pod's gpu-memory request. This assumption is broken when the node has different GPU memory sizes, which may cause the pod not to allocate the remaining GPUs.
Since a pod sometimes requests barely a GPU card ignoring the size of GPU memory, DeviceShare should allow this allocation.
Why is this needed:
Is there a suggested solution, if so, please add it:
What is your proposal:
Currently, when a pod requests a GPU card without a particular GPU memory request, the DeviceShare scheduling supposes all the GPUs on one node have the same memory capacity and randomly picks one as the pod's gpu-memory request. This assumption is broken when the node has different GPU memory sizes, which may cause the pod not to allocate the remaining GPUs. Since a pod sometimes requests barely a GPU card ignoring the size of GPU memory, DeviceShare should allow this allocation.
Why is this needed:
Is there a suggested solution, if so, please add it: