Open linnlh opened 2 months ago
cc @richardliaw @rkooo567 if you need to update ray doc
@flliny we'd be happy to add some links in https://docs.vllm.ai/en/latest/serving/distributed_serving.html#multi-node-inference-and-serving to link to your urls.
the vllm team itself will not work on kuberay integration.
@flliny we'd be happy to add some links in https://docs.vllm.ai/en/latest/serving/distributed_serving.html#multi-node-inference-and-serving to link to your urls.
the vllm team itself will not work on kuberay integration.
Thanks for reply.🙏
@flliny we'd be happy to add some links in https://docs.vllm.ai/en/latest/serving/distributed_serving.html#multi-node-inference-and-serving to link to your urls.
the vllm team itself will not work on kuberay integration.
There's no official version of Ray Serve yet. Does the official team have any intention to integrate it? Because Kubernetes Deployments aren't suitable for deploying distributed inference services. I'm considering using either LWS or Ray Serve to deploy my service. If the official provides multi-replica distributed inference services for Ray Serve, it would facilitate the deployment of services.
@linnlh we have a proposal and already built the internal version for such case. https://docs.google.com/document/d/1K8Ve6KrabpexH-gIEcby9tKTEFysTd6kOZKZa_EdgRQ/edit#heading=h.fw9nktz8l24d the OSS plan is on the way. feel free to let me know your feedbacks
🚀 The feature, motivation and pitch
Hi, I'm currently working on deploying vLLM distributed on multi-node in k8s cluster. I saw that the official documentation provided a link by using LWS to deploy vllm for distributed model serving. The KubeRay team also provided a solution for multi-node deployment by using Ray Serve. But neither of these solutions has been integrated into the vllm codebase.
I was wondering if there are any development plans in it for vllm offcial team? If so, I am willing to provide relevant support in terms of code.
Alternatives
I have tried ray service example to deploy a 2-node serving. And it supposes to work. But there are some parts of the code need to be modified to be compatible with the latest version of vLLM.
The related works are listed below: