sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sglang.readthedocs.io/en/latest/
Apache License 2.0
5.62k stars 435 forks source link

[Feature] Multi-instance deployment #1649

Open vkc1vk opened 4 days ago

vkc1vk commented 4 days ago

Checklist

Motivation

I want to host multiple instances of Llama 3 8B across multiple A100 GPUs (with load balancing handled). Is there a way to do this currently?

Related resources

n/a

merrymercy commented 3 days ago

We support a simple data parallelism. Could you try this? https://github.com/sgl-project/sglang/blob/69aa937aa528f0066ab5226bb428cbdf37dec048/README.md?plain=1#L217-L220

vkc1vk commented 3 days ago

Thanks! Is this option available with sgl.Engine as well (Batched offline mode)

ByronHsu commented 3 days ago

Yes you can use sgl.Engine with the current simple DP. We are working on the full-fledged version.