Open shaowei-su opened 2 months ago
This is not a KubeRay-specific issue. See https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/gpu.html#gpu-multi-tenancy for more details. Recently, GPU UX on K8s seems to have improved. I will take a look at MIG and time-slicing GPU and get back to you.
Search before asking
KubeRay Component
ray-operator
What happened + What you expected to happen
If Ray head node is scheduled on GPU node with no GPU resource requested, e.g
Ray resource scheduler can still access those GPUs accidentally and considered the entire host GPU as "Logical Resources" during scheduling.
Reproduction script
Use
RayJob
CRD to scheduled both head and workers on the same physical host with > 1 GPUs.Anything else
No response
Are you willing to submit a PR?