Since both the scheduling result and the assignment device are implemented in the scheduling plug-in, the scheduling result is marked in the pod Annotations. And dp is not aware of scheduling results. When Pod is created, Kubelet calls the Allocate method of dp. Since kubelet's native code is not aware of the scheduling result, it is kubelet itself that calculates the device ID according to the allocation algorithm. The device ID on Annotations cannot be obtained based on PodName (the scheduler has assigned it). For the above reasons, a node lock scheme can be considered: Verify that the Pod has device resource application. If so, write the node lock in the Bind phase of the Pod to ensure that only one Pod preempt the lock. The dp side can query the Pod that is Pennding on the current Node, and analyze the device ID list assigned by the center side from the Pod Annotations. Further processing is performed and the AllocateResponse return is finally built. In the GPU scheduling scenario, a server has a maximum of eight GPU cards, so the number of Pods for each server is small, and the performance loss caused by frequent creation is small. Therefore, this option can be considered.
The general code idea:
On the dispatch center side, after successful scheduling and assignment results are available, call in Bind:
current, err := kubeClient.CoreV1().Pods(args.PodNamespace).Get(context.Background(), args.PodName, metav1.GetOptions{})// Gets the current pod object
LockNode(node, current)// Adds a node lock
Allocate(ctx context.Context, Reqs kubeletdevicepluginv1beta1 AllocateRequest) ( kubeletdevicepluginv1beta1 AllocateResponse, error) method:
current, err := util.GetPendingPod(nodename)// Traverses all Pods of the current Node and finds the Pending Pods
Resolve the assigned device ID according to current.Annotations.
Please consider this solution and if feasible, consider implementing it
Since both the scheduling result and the assignment device are implemented in the scheduling plug-in, the scheduling result is marked in the pod Annotations. And dp is not aware of scheduling results. When Pod is created, Kubelet calls the Allocate method of dp. Since kubelet's native code is not aware of the scheduling result, it is kubelet itself that calculates the device ID according to the allocation algorithm. The device ID on Annotations cannot be obtained based on PodName (the scheduler has assigned it). For the above reasons, a node lock scheme can be considered: Verify that the Pod has device resource application. If so, write the node lock in the Bind phase of the Pod to ensure that only one Pod preempt the lock. The dp side can query the Pod that is Pennding on the current Node, and analyze the device ID list assigned by the center side from the Pod Annotations. Further processing is performed and the AllocateResponse return is finally built. In the GPU scheduling scenario, a server has a maximum of eight GPU cards, so the number of Pods for each server is small, and the performance loss caused by frequent creation is small. Therefore, this option can be considered. The general code idea:
Please consider this solution and if feasible, consider implementing it