pod priority was assigned to 0 though the priorityclassname of the podgroup had been assigned

Robin7831 commented 10 months ago

Hi everybody, I'm testing mpi-operator versioned 4.0.0 recently, I found the mpijob was supported to be scheduled by volcano, so I modify the deployment of mpi-operator and spec of my mpijob yaml following https://github.com/kubeflow/website/pull/3453/files, however, though the podgroup was successfully created and the priorityclassname correctly assigned, pods' priority are 0s.

create the priorityclass:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high
value: 1000
globalDefault: false
description: "A high-priority class for important Pods."
preemptionPolicy: PreemptLowerPriority

check the priorityclass

NAME                      VALUE        GLOBAL-DEFAULT   AGE
high                      1000         false            5h28m
system-cluster-critical   2000000000   false            510d
system-node-critical      2000001000   false            510d

check the deployment of mpi-operator:

spec:
  containers:
  - args:
    - --gang-scheduling=volcano
    - -alsologtostderr
    - --lock-namespace=mpi-operator
    image: myharbor/common/mpi-operator:0.4.0

create the mpijob

apiVersion: kubeflow.org/v2beta1
kind: MPIJob
metadata:
name: mpitest-helloworld
namespace: mpi-operator
spec:
slotsPerWorker: 2
runPolicy:
cleanPodPolicy: Running
schedulingPolicy:
  minAvailable: 1
  minResources:
    cpu: "4"
    memory: 16Gi
  priorityClass: high
  scheduleTimeoutSeconds: 300
mpiReplicaSpecs:
Launcher:
  replicas: 1
  template:
     spec:
       containers:
       - image: myharbor/common/mpi-base:testv0
         imagePullPolicy: Always
         name: hellompi-launcher
         command:
         - sleep
         - infinity
Worker:
  replicas: 2
  template:
    spec:
      containers:
      - image: myharbor/common/mpi-base:testv0
        imagePullPolicy: Always
        name: hellompi-worker
        resources:
          requests:
            cpu: "2"
            memory: 8Gi
          limits:
            cpu: "2"
            memory: 8Gi

check the podgroup

---
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
name: mpitest-helloworld
namespace: mpi-operator
ownerReferences:
- apiVersion: kubeflow.org/v2beta1
  blockOwnerDeletion: true
  controller: true
  kind: MPIJob
  name: mpitest-helloworld
  uid: 23f17ad5-0f50-43cf-9a4d-0795a6cacee8
resourceVersion: '115433952'
spec:
minMember: 1
minResources:
cpu: '4'
memory: 16Gi
nvidia.com/gpu: '1'
priorityClassName: high
status:
conditions:
- lastTransitionTime: '2023-09-07T07:04:35Z'
  reason: tasks in gang are ready to be scheduled
  status: 'True'
  transitionID: fefde024-6bc4-418a-b3b1-6b571beea9a3
  type: Scheduled
phase: Running
running: 2

however the priority of pods are 0s

[root@k8s-master1 ]# kubectl describe po -n mpi-operator mpitest-helloworld-launcher-59bqp
Name:         mpitest-helloworld-launcher-59bqp
Namespace:    mpi-operator
Priority:     0

tenzen-y commented 10 months ago

The mpi-operator passes the .spec.runPolicy.priorityClass in the MPIJob only to the PodGroup resource and doesn't pass the priorityClass to the pod. So, the behavior is expected. If you want to set priorityClass to the pod, you must set the direct the priorityClass to the pods.

Robin7831 commented 10 months ago

thx a lot! I missed the point

The mpi-operator passes the .spec.runPolicy.priorityClass in the MPIJob only to the PodGroup resource and doesn't pass the priorityClass to the pod. So, the behavior is expected. If you want to set priorityClass to the pod, you must set the direct the priorityClass to the pods.

I do miss the point, and I just managed to it. Thx a lot !!!

kubeflow / mpi-operator

pod priority was assigned to 0 though the priorityclassname of the podgroup had been assigned #592