[Feature] Multi-instance deployment

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

https://sgl-project.github.io/

Apache License 2.0

6.21k stars 529 forks source link

Open vkc1vk opened 1 month ago

vkc1vk commented 1 month ago

[X] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
[X] 2. Please use English, otherwise it will be closed.

I want to host multiple instances of Llama 3 8B across multiple A100 GPUs (with load balancing handled). Is there a way to do this currently?

n/a

merrymercy commented 1 month ago

vkc1vk commented 1 month ago

Thanks! Is this option available with sgl.Engine as well (Batched offline mode)

ByronHsu commented 1 month ago

Yes you can use sgl.Engine with the current simple DP. We are working on the full-fledged version.

tsaoyu commented 5 days ago

What about multi-node? I am currently using Ray with vLLM and it was not that straightforward to do so (Hand-crafted PlacementGroup)

merrymercy commented 3 days ago

ByronHsu commented 2 days ago

We will merge #2114 very soon and release an experimental router which can be downloaded using pip