ray-project / kuberay

A toolkit to run Ray applications on Kubernetes
Apache License 2.0
1.21k stars 391 forks source link

[Bug] KubeRay operator error if installed with helm chart with value `batchScheduler.name` set to `volcano` #2428

Open MortalHappiness opened 2 weeks ago

MortalHappiness commented 2 weeks ago

Search before asking

KubeRay Component

ray-operator

What happened + What you expected to happen

If KubeRay is installed via helm install kuberay-operator kuberay/kuberay-operator --version 1.2.2 --set batchScheduler.name=volcano, the following error will occur:

{"level":"info","ts":"2024-10-06T03:16:19.757Z","logger":"setup","msg":"Feature flag batch-scheduler is enabled","scheduler name":"volcano"}
{"level":"info","ts":"2024-10-06T03:16:19.757Z","logger":"setup","msg":"Loaded feature gates","featureGates":{"RayClusterStatusConditions":false}}
{"level":"info","ts":"2024-10-06T03:16:19.757Z","logger":"setup","msg":"Flag watchNamespace is not set. Watch custom resources in all namespaces."}
{"level":"info","ts":"2024-10-06T03:16:19.757Z","logger":"setup","msg":"Setup manager"}
panic: podGroup CRD is required to exist in current cluster. error: customresourcedefinitions.apiextensions.k8s.io "podgroups.scheduling.volcano.sh" is forbidden: User "system:serviceaccount:default:kuberay-operator" cannot get resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope

goroutine 1 [running]:
github.com/ray-project/kuberay/ray-operator/controllers/ray.NewReconciler({_, _}, {_, _}, {{0x0, 0x0, 0x0}, {0x0, 0x0, 0x0}}, ...)
    /home/runner/work/kuberay/kuberay/ray-operator/controllers/ray/raycluster_controller.go:114 +0x30b
main.main()
    /home/runner/work/kuberay/kuberay/ray-operator/main.go:226 +0x12c8
Stream closed EOF for default/kuberay-operator-68f8d7665b-786rb (kuberay-operator)

Reproduction script

Anything else

No response

Are you willing to submit a PR?

tinaxfwu commented 2 weeks ago

Hi @MortalHappiness, I can help with this issue.

kevin85421 commented 2 weeks ago

https://github.com/ray-project/kuberay/blob/release-1.2.2/helm-chart/kuberay-operator/templates/_helpers.tpl#L335

We should add podgroup to the Role or ClusterRole when using Volcano.