Open sam-h-bean opened 12 months ago
Does this ticket mean that distributed serving is not supported in kubernetes, even if k8s has ray installed as per this quickstart guide?
https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/raycluster-quick-start.html
The docs are very sparse here and I am confused, since they imply ray can be used for distributed inf.: https://vllm.readthedocs.io/en/latest/serving/distributed_serving.html
Only having support for ray for distributed inference will significantly reduce adoption of this tool if it truly is more performant than TGI. TGI can be run as a black-box image on Kubernetes with support for sharded models and vLLM should support this as well.