sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sgl-project.github.io/
Apache License 2.0
6.21k stars 529 forks source link

[Feature] Multi-instance deployment #1649

Open vkc1vk opened 1 month ago

vkc1vk commented 1 month ago

Checklist

Motivation

I want to host multiple instances of Llama 3 8B across multiple A100 GPUs (with load balancing handled). Is there a way to do this currently?

Related resources

n/a

merrymercy commented 1 month ago

We support a simple data parallelism. Could you try this? https://github.com/sgl-project/sglang/blob/69aa937aa528f0066ab5226bb428cbdf37dec048/README.md?plain=1#L217-L220

vkc1vk commented 1 month ago

Thanks! Is this option available with sgl.Engine as well (Batched offline mode)

ByronHsu commented 1 month ago

Yes you can use sgl.Engine with the current simple DP. We are working on the full-fledged version.

tsaoyu commented 5 days ago

What about multi-node? I am currently using Ray with vLLM and it was not that straightforward to do so (Hand-crafted PlacementGroup)

merrymercy commented 3 days ago

@tsaoyu try this? https://github.com/sgl-project/sglang/pull/2114

ByronHsu commented 2 days ago

We will merge #2114 very soon and release an experimental router which can be downloaded using pip