Open Jeffwan opened 2 months ago
Related to https://github.com/ray-project/kuberay/issues/527
There have been some discussions to adapt RayService to improve rolling upgrade support and support N RayClusters. There's also been some early discussions on LWS-like behavior in KubeRay
@andrewsykim thanks for the feedback. RayService
would be separate stories. I think this proposal focus more on the arbitrary ray application instead of ray serve
applications. ray serve
application should always go to RayService
. If RayService decide to support N RayClusters, some components like RayClusterReplicaSet
could play the role of the building blocks and these upper level API can benefit from it.
@Jeffwan @andrewsykim Let's discuss the proposal after v1.2.0 release.
Search before asking
/cc Bytedancer @Basasuya @Yicheng-Lu-llll
Description
The recent release of the Llama 3.1-405b model by Meta has significantly increased the demand for multi-node inference. However, existing inference frameworks like vLLM do not natively provide robust solutions to managing them in scale. This necessitates an external resource orchestration system to effectively support and manage these complex inference tasks. vLLM adopts Ray as the default distributed executor and KubeRay offers the best support for Ray. Therefore, Kuberay should be the top choice for vLLM orchestration solutions. There're still few gaps to manage it efficiently and we want to propose new APIs to better support such cases.
Use case
Support arbitrary ray application with multiple replicas. Some users prefer to use raw APIs to build ray applications. RayCluster can undoubtedly meet the needs for a single instance. If a user prefers a one-to-one mapping of RayCluster:App instances, currently there is no way to support such cases.
Related issues
Detail proposed doc: https://docs.google.com/document/d/1K8Ve6KrabpexH-gIEcby9tKTEFysTd6kOZKZa_EdgRQ/edit
Are you willing to submit a PR?