ray-project / kuberay

A toolkit to run Ray applications on Kubernetes
Apache License 2.0
963 stars 328 forks source link

Support Apache YuniKorn as one batch scheduler option #2184

Open yangwwei opened 3 weeks ago

yangwwei commented 3 weeks ago

Why are these changes needed?

Apache YuniKorn is a widely used batch scheduler for Kubernetes, this PR is to support Apache yunikorn as a option for scheduling Ray workloads.

The integration is very simpler, Apache YuniKorn doesn't require any CR to be created, the changes in the job controller code is to automatically inject required labels to Ray pods, only 2 extra lables are needed

when all pods have the above labels, the yunikorn scheduler will automatically recognize these pods belong to the same Ray application, and schedule them in the given queue. Then the Ray workload can benifit all batch scheduling features yunikorn provided: https://yunikorn.apache.org/docs/next/get_started/core_features

Related issue number

https://github.com/ray-project/kuberay/issues/1457

Checks

kevin85421 commented 2 weeks ago

Hi @yangwwei, thank you for the PR! Are you in the Ray Slack workspace? My Slack handle is "Kai-Hsun Chen (ray team)" We can have a quick sync on Slack to discuss how the KubeRay/Ray community works (e.g., how to propose a new enhancement).

yangwwei commented 1 week ago

@kevin85421 please see proposal: https://github.com/ray-project/enhancements/pull/53