ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.93k stars 5.57k forks source link

[KubeRay, VM launcher] Provide clear guidelines for whether to have multi-tenant, long-running clusters for use cases other than Ray Serve #35250

Open scottsun94 opened 1 year ago

scottsun94 commented 1 year ago

Description

Users want clearer guidelines around whether to set up long-running, multi-tenant Ray Clusters. We should make it clear that we don't recommend long-running clusters for non-ray-serve use cases. https://github.com/ray-project/ray/issues/35202#issuecomment-1542932603

cc: @gvspraveen @architkulkarni @kevin85421

Link

No response

scottsun94 commented 1 year ago

https://docs.ray.io/en/latest/cluster/faq.html#do-ray-clusters-support-multi-tenancy We do have doc about multi-tenancy which is not recommended for the production use case.

Whether people should set up long-running clusters is still not clear

gvspraveen commented 1 year ago

Agree. Our documentation also says On the other hand, you can run the same job multiple times using the same cluster to save the cluster startup time. But it is not clear if this is directly recommending long running clusters.

I heard of few use cases in Kuberay where we recommending using RayCluster resources in kuberay to setup long running cluster and submit multiple jobs.