ray-project / kuberay

A toolkit to run Ray applications on Kubernetes
Apache License 2.0
963 stars 328 forks source link

[Feature] can RayJob run in a local cluster ? #2206

Open shuaiyy opened 5 days ago

shuaiyy commented 5 days ago

Search before asking

Description

run a job in local cluster, without build a remote RayCluster

Use case

in my case, I have many small jobs which can be run in single node with a few resource and will be finished in 60s; when use RayCluster, it will cost additional 60s(about) to build a RayCluster before job run it's code.

Related issues

No response

Are you willing to submit a PR?

andrewsykim commented 4 days ago

run a job in local cluster, without build a remote RayCluster

A RayJob can run on an existing cluster using the clusterSelector field. This way you can create a single RayCluster and then run multiple RayJob against the RayCluster .Would that work for you?

shuaiyy commented 4 days ago

run a job in local cluster, without build a remote RayCluster

you can create a single RayCluster and then run multiple RayJob against the RayCluster .

Thx, It's not work in my case. We want a quick run and return result, so there are some kinds of images with different pre-installed dependencies. Second, even with an exist cluster with HPA, kuberay still need to create a k8s job to submit job. If no enough resource, raycluster HPA will cost more seconds.


If we can run rayjob in the Submit Job's Pod?

andrewsykim commented 4 days ago

kuberay still need to create a k8s job to submit job

There's a HTTP submission mode that doesn't use submitter Job https://github.com/ray-project/kuberay/blob/master/ray-operator/apis/ray/v1/rayjob_types.go#L92

Yicheng-Lu-llll commented 3 days ago

You can use the runtime environment to install the dependencies: https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments. Simply add them here: https://github.com/ray-project/kuberay/blob/master/ray-operator/apis/ray/v1/rayjob_types.go#L87.

shuaiyy commented 3 days ago

You can use the runtime environment to install the dependencies: https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments. Simply add them here: https://github.com/ray-project/kuberay/blob/master/ray-operator/apis/ray/v1/rayjob_types.go#L87.

Thx. we already used these features when run distribute training jobs.

shuaiyy commented 3 days ago

Finally, I decided to use kubeRayjob to run distriube trainings, and use VolcanoJob or K8sNativeJob to run a single pod job.

When run in a single pod, I'm not sure If it's okay to run ray start --head && ray job submit