Open ameya-parab opened 1 week ago
Do you install training-operator with gang-scheduler-name
arg specified as volcano
? If not, you can check this as reference: https://www.kubeflow.org/docs/components/training/user-guides/job-scheduling/#volcano-scheduler
In default, we use kueue
as gang-scheduler. So even if you specified schedulerName
in the pod template, training-operator will still take your runPolicy
field for kueue config.
cc👀 @kubeflow/wg-training-leads
What happened?
I am unable to use any custom queues created for use with the Volcano Scheduler for Kubeflow MPIJobs. When Volcano creates a PodGroup, it is automatically assigned to the
default
queue rather than the custom queue mentioned as part ofrunPolicy.schedulingPolicy.queue
spec.The following MPIjob should use the custom queue
production
, but it instead uses thedefault
queue.Resultant PodGroup:
What did you expect to happen?
If the
runPolicy.schedulingPolicy.queue
specifies a custom queue. The Volcano PodGroup should be assigned to that specific group, not thedefault
Volcano Queue.Environment
Kubernetes version: 1.25 Training Operator version: kubeflow/training-operator:v1-855e096 Training Operator Python SDK version: NA Volcano version: 1.10.0
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍