[RFC] Introduce new API-RayCluster Fleet and ReplicaSet in KubeRay

Jeffwan commented 2 months ago

Search before asking

[X] I had searched in the issues and found no similar feature requirement.

/cc Bytedancer @Basasuya @Yicheng-Lu-llll

Description

The recent release of the Llama 3.1-405b model by Meta has significantly increased the demand for multi-node inference. However, existing inference frameworks like vLLM do not natively provide robust solutions to managing them in scale. This necessitates an external resource orchestration system to effectively support and manage these complex inference tasks. vLLM adopts Ray as the default distributed executor and KubeRay offers the best support for Ray. Therefore, Kuberay should be the top choice for vLLM orchestration solutions. There're still few gaps to manage it efficiently and we want to propose new APIs to better support such cases.

Use case

Support arbitrary ray application with multiple replicas. Some users prefer to use raw APIs to build ray applications. RayCluster can undoubtedly meet the needs for a single instance. If a user prefers a one-to-one mapping of RayCluster:App instances, currently there is no way to support such cases.

Are you willing to submit a PR?

[X] Yes I am willing to submit a PR!

andrewsykim commented 2 months ago

There have been some discussions to adapt RayService to improve rolling upgrade support and support N RayClusters. There's also been some early discussions on LWS-like behavior in KubeRay

Jeffwan commented 2 months ago

@andrewsykim thanks for the feedback. RayService would be separate stories. I think this proposal focus more on the arbitrary ray application instead of ray serve applications. ray serve application should always go to RayService. If RayService decide to support N RayClusters, some components like RayClusterReplicaSet could play the role of the building blocks and these upper level API can benefit from it.

kevin85421 commented 2 months ago

@Jeffwan @andrewsykim Let's discuss the proposal after v1.2.0 release.

ray-project / kuberay