Open KunWuLuan opened 2 days ago
You should install the Volcano scheduler.
I just want to enable the batch scheduler. And I have not submit any job using volcano scheduler. I still need to install volcano even if I just want to use yunikorn?
If you want to use Yunikorn, you should use --batch-scheduler=yunikorn
and not --enable-batch-scheduler
. We deprecated --enable-batch-scheduler=true
in favor of --batch-scheduler=yunikorn|volcano
when we added yunikorn support: https://github.com/ray-project/kuberay/pull/2300
Thanks for reply @andrewsykim . If I enable --batch-scheduler=yunikorn
when using ray-operator, I can still set ray.io/scheduler-name=volcano
on RayJob, right? What will happen in this scenario?
If I enable --batch-scheduler=yunikorn when using ray-operator, I can still set ray.io/scheduler-name=volcano on RayJob, right?
No I don't think so, KubeRay only supports one batch scheduler at a time. What is your use-case? Are you trying to use both Yunikorn and Volcano?
If I enable --batch-scheduler=yunikorn when using ray-operator, I can still set ray.io/scheduler-name=volcano on RayJob, right?
No I don't think so, KubeRay only supports one batch scheduler at a time. What is your use-case? Are you trying to use both Yunikorn and Volcano?
@andrewsykim If we set kuberay to the --batch-scheduler
parameter, there is no need for the user to set the raycluster label ray.io/scheduler-name=***
. Judging from the latest code, it's still the old logic。 If we enable --batch-scheduler=yunikorn when can still using ray-operator, I can still set ray.io/scheduler-name=volcano
We provide managed ray-operator for our users. Submitting RayJob is the job of our users, they may make mistakes, and no event is sent when the wrong schedulerName is set on the RayJobs. Once we only support one batch scheduler, why we let user choose scheduler on RayJob by ray.io/scheduler-name
? Maybe they just need to use label like ray.io/enable-batch-scheduler
on RayJob.
How do you think?
Search before asking
KubeRay Component
ray-operator
What happened + What you expected to happen
When i start the controller with
in vscode. The controller is blocked by cache syncing:
Finally, the controller failed:
There is no PodGroup CRD in my cluster.
After I remove the code https://github.com/ray-project/kuberay/blob/bf21d2d01cf1c931136d869d2c8168aed07bc68c/ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go#L184-L186, the controller can be started.
Reproduction script
Anything else
No response
Are you willing to submit a PR?