Open shuaiyy opened 5 days ago
run a job in local cluster, without build a remote RayCluster
A RayJob can run on an existing cluster using the clusterSelector field. This way you can create a single RayCluster and then run multiple RayJob against the RayCluster .Would that work for you?
run a job in local cluster, without build a remote RayCluster
you can create a single RayCluster and then run multiple RayJob against the RayCluster .
Thx, It's not work in my case. We want a quick run and return result, so there are some kinds of images with different pre-installed dependencies. Second, even with an exist cluster with HPA, kuberay still need to create a k8s job to submit job. If no enough resource, raycluster HPA will cost more seconds.
If we can run rayjob in the Submit Job's Pod?
kuberay still need to create a k8s job to submit job
There's a HTTP submission mode that doesn't use submitter Job https://github.com/ray-project/kuberay/blob/master/ray-operator/apis/ray/v1/rayjob_types.go#L92
You can use the runtime environment to install the dependencies: https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments. Simply add them here: https://github.com/ray-project/kuberay/blob/master/ray-operator/apis/ray/v1/rayjob_types.go#L87.
You can use the runtime environment to install the dependencies: https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments. Simply add them here: https://github.com/ray-project/kuberay/blob/master/ray-operator/apis/ray/v1/rayjob_types.go#L87.
Thx. we already used these features when run distribute training jobs.
Finally, I decided to use kubeRayjob to run distriube trainings, and use VolcanoJob or K8sNativeJob to run a single pod job.
When run in a single pod, I'm not sure If it's okay to run ray start --head && ray job submit
Search before asking
Description
run a job in local cluster, without build a remote RayCluster
Use case
in my case, I have many small jobs which can be run in single node with a few resource and will be finished in 60s; when use RayCluster, it will cost additional 60s(about) to build a RayCluster before job run it's code.
Related issues
No response
Are you willing to submit a PR?