When multiple pods request a portion of the GPUs on a node, there is an issue of the GPUs being allocated repeatedly.
As shown below,three pods are scheduled to the same node, on which has 8 GPUs(id:0,1,2,3,4,5,6,7).Three pods request 2, 2, and 4 GPUs respectively.
Another condition is the pods must not being launched at the same time(pods must not being processed in the same volcano scheduling-loop);
The result is GPU 1 and 2 being allocated to pod1 and also to pod3, which I believe is incorrect.
kubectl -n ai-vc get po notebook-pod1 -o yaml
volcano.sh/gpu-index: 1,2
creationTimestamp: "2024-11-04T02:03:48Z"
kubectl -n ai-vc get po notebook-pod2 -o yaml
volcano.sh/gpu-index: 7,0
creationTimestamp: "2024-11-06T06:43:32Z"
kubectl -n ai-vc get po notebook-pod3 -o yaml
volcano.sh/gpu-index: 0,1,2,3
creationTimestamp: "2024-11-07T10:23:46Z"
Steps to reproduce the issue
1.Deploy voclano(v1.7.0) and volcano-device-plugin correctly. Describe gpu node and see 4 "volcano.sh/gpu-number" resources on it.
2.Launch 4 pods one by one at 3-second intervals, with each pod requesting one GPU resource.
kubectl apply -f gpu-test-gpu3-1.yaml;sleep 3
kubectl apply -f gpu-test-gpu3-2.yaml;sleep 3
kubectl apply -f gpu-test-gpu3-3.yaml;sleep 3
kubectl apply -f gpu-test-gpu3-4.yaml;sleep 3
test yaml is shown as below:
apiVersion: v1
kind: Pod
metadata:
name: gpu-test-gpu3-1 #or 2,3,4
spec:
restartPolicy: Never
nodeSelector:
kubernetes.io/hostname: gpu3
schedulerName: volcano
tolerations:
operator: "Exists"
containers:
name: cuda-container
image: 172.25.61.5:5000/volcano/cuda-sample:vectoradd-cuda10.2
command: ["sleep", "1000"]
resources:
requests:
volcano.sh/gpu-number: 1
limits:
volcano.sh/gpu-number: 1
3.Watch the status of pods and gpu-index being allocated to each pod
kubectl get po | grep gpu3
kubectl describe po gpu-test-gpu3-1 | grep gpu-index
kubectl describe po gpu-test-gpu3-2 | grep gpu-index
kubectl describe po gpu-test-gpu3-3 | grep gpu-index
kubectl describe po gpu-test-gpu3-4 | grep gpu-index
Describe the results you received and expected
result receive: all pod running and some gpu-index of pods are same
[root@master1 fhy]# kubectl get po | grep gpu3
gpu-test-gpu3-1 1/1 Running 0 13s
gpu-test-gpu3-2 1/1 Running 0 10s
gpu-test-gpu3-3 1/1 Running 0 6s
gpu-test-gpu3-4 1/1 Running 0 3s
[root@master1 fhy]# kubectl describe po gpu-test-gpu3-1 | grep gpu-index
volcano.sh/gpu-index: 3
[root@master1 fhy]# kubectl describe po gpu-test-gpu3-2 | grep gpu-index
volcano.sh/gpu-index: 0
[root@master1 fhy]# kubectl describe po gpu-test-gpu3-3 | grep gpu-index
volcano.sh/gpu-index: 3
[root@master1 fhy]# kubectl describe po gpu-test-gpu3-4 | grep gpu-index
volcano.sh/gpu-index: 0
result expect: all pod running and gpu-index of pods should be different to each other.
[root@master1 fhy]# kubectl get po | grep gpu3
gpu-test-gpu3-1 1/1 Running 0 12s
gpu-test-gpu3-2 1/1 Running 0 9s
gpu-test-gpu3-3 1/1 Running 0 6s
gpu-test-gpu3-4 1/1 Running 0 3s
[root@master1 fhy]# kubectl describe po gpu-test-gpu3-1 | grep gpu-index
volcano.sh/gpu-index: 1
[root@master1 fhy]# kubectl describe po gpu-test-gpu3-2 | grep gpu-index
volcano.sh/gpu-index: 0
[root@master1 fhy]# kubectl describe po gpu-test-gpu3-3 | grep gpu-index
volcano.sh/gpu-index: 2
[root@master1 fhy]# kubectl describe po gpu-test-gpu3-4 | grep gpu-index
volcano.sh/gpu-index: 3
What version of Volcano are you using?
v1.7.0
Any other relevant information
I think the reason is:
ssn.Node[nodename].GPUDevices[dev-id] records GPUs occupied by existing pod and Gpus allocated by current scheduler-loop;
But pods using "volcano.sh/gpu-number" are not recorded correctly, so the allocated GPU is mistakenly considered idle.
My solution is shown below:
//Before modify:
func (ni NodeInfo) AddGPUResource(pod v1.Pod) {
gpuRes := GetGPUMemoryOfPod(pod)
if gpuRes > 0 { //only consider pods using gpu-memory;
ids := GetGPUIndex(pod)
for _, id := range ids {
if dev := ni.GPUDevices[id]; dev != nil {
dev.PodMap[string(pod.UID)] = pod
}
}
}
}
//After modify:
func (ni NodeInfo) AddGPUResource(pod v1.Pod) {
gpuRes := GetGPUMemoryOfPod(pod)
gpuNumRes := GetGPUNumberOfPod(pod) //both consider pods using gpu-memory and gpu-number;
if gpuRes > 0 || gpuNumRes > 0 { //both consider pods using gpu-memory and gpu-number;
ids := GetGPUIndex(pod)
for _, id := range ids {
if dev := ni.GPUDevices[id]; dev != nil {
dev.PodMap[string(pod.UID)] = pod
}
}
}
}
Description
When multiple pods request a portion of the GPUs on a node, there is an issue of the GPUs being allocated repeatedly. As shown below,three pods are scheduled to the same node, on which has 8 GPUs(id:0,1,2,3,4,5,6,7).Three pods request 2, 2, and 4 GPUs respectively. Another condition is the pods must not being launched at the same time(pods must not being processed in the same volcano scheduling-loop);
The result is GPU 1 and 2 being allocated to pod1 and also to pod3, which I believe is incorrect.
kubectl -n ai-vc get po notebook-pod1 -o yaml volcano.sh/gpu-index: 1,2 creationTimestamp: "2024-11-04T02:03:48Z" kubectl -n ai-vc get po notebook-pod2 -o yaml volcano.sh/gpu-index: 7,0 creationTimestamp: "2024-11-06T06:43:32Z" kubectl -n ai-vc get po notebook-pod3 -o yaml volcano.sh/gpu-index: 0,1,2,3 creationTimestamp: "2024-11-07T10:23:46Z"
Steps to reproduce the issue
1.Deploy voclano(v1.7.0) and volcano-device-plugin correctly. Describe gpu node and see 4 "volcano.sh/gpu-number" resources on it. 2.Launch 4 pods one by one at 3-second intervals, with each pod requesting one GPU resource. kubectl apply -f gpu-test-gpu3-1.yaml;sleep 3
kubectl apply -f gpu-test-gpu3-2.yaml;sleep 3 kubectl apply -f gpu-test-gpu3-3.yaml;sleep 3 kubectl apply -f gpu-test-gpu3-4.yaml;sleep 3
test yaml is shown as below:
apiVersion: v1 kind: Pod metadata: name: gpu-test-gpu3-1 #or 2,3,4 spec: restartPolicy: Never nodeSelector: kubernetes.io/hostname: gpu3 schedulerName: volcano tolerations:
Describe the results you received and expected
result receive: all pod running and some gpu-index of pods are same [root@master1 fhy]# kubectl get po | grep gpu3 gpu-test-gpu3-1 1/1 Running 0 13s gpu-test-gpu3-2 1/1 Running 0 10s gpu-test-gpu3-3 1/1 Running 0 6s gpu-test-gpu3-4 1/1 Running 0 3s [root@master1 fhy]# kubectl describe po gpu-test-gpu3-1 | grep gpu-index volcano.sh/gpu-index: 3 [root@master1 fhy]# kubectl describe po gpu-test-gpu3-2 | grep gpu-index volcano.sh/gpu-index: 0 [root@master1 fhy]# kubectl describe po gpu-test-gpu3-3 | grep gpu-index volcano.sh/gpu-index: 3 [root@master1 fhy]# kubectl describe po gpu-test-gpu3-4 | grep gpu-index volcano.sh/gpu-index: 0
result expect: all pod running and gpu-index of pods should be different to each other. [root@master1 fhy]# kubectl get po | grep gpu3 gpu-test-gpu3-1 1/1 Running 0 12s gpu-test-gpu3-2 1/1 Running 0 9s gpu-test-gpu3-3 1/1 Running 0 6s gpu-test-gpu3-4 1/1 Running 0 3s [root@master1 fhy]# kubectl describe po gpu-test-gpu3-1 | grep gpu-index volcano.sh/gpu-index: 1 [root@master1 fhy]# kubectl describe po gpu-test-gpu3-2 | grep gpu-index volcano.sh/gpu-index: 0 [root@master1 fhy]# kubectl describe po gpu-test-gpu3-3 | grep gpu-index volcano.sh/gpu-index: 2 [root@master1 fhy]# kubectl describe po gpu-test-gpu3-4 | grep gpu-index volcano.sh/gpu-index: 3
What version of Volcano are you using?
v1.7.0
Any other relevant information
I think the reason is:
ssn.Node[nodename].GPUDevices[dev-id] records GPUs occupied by existing pod and Gpus allocated by current scheduler-loop; But pods using "volcano.sh/gpu-number" are not recorded correctly, so the allocated GPU is mistakenly considered idle.
My solution is shown below:
//Before modify: func (ni NodeInfo) AddGPUResource(pod v1.Pod) { gpuRes := GetGPUMemoryOfPod(pod)
if gpuRes > 0 { //only consider pods using gpu-memory; ids := GetGPUIndex(pod) for _, id := range ids { if dev := ni.GPUDevices[id]; dev != nil { dev.PodMap[string(pod.UID)] = pod } } } }
//After modify: func (ni NodeInfo) AddGPUResource(pod v1.Pod) { gpuRes := GetGPUMemoryOfPod(pod) gpuNumRes := GetGPUNumberOfPod(pod) //both consider pods using gpu-memory and gpu-number; if gpuRes > 0 || gpuNumRes > 0 { //both consider pods using gpu-memory and gpu-number; ids := GetGPUIndex(pod) for _, id := range ids { if dev := ni.GPUDevices[id]; dev != nil { dev.PodMap[string(pod.UID)] = pod } } } }