Open vkc1vk opened 1 month ago
We support a simple data parallelism. Could you try this? https://github.com/sgl-project/sglang/blob/69aa937aa528f0066ab5226bb428cbdf37dec048/README.md?plain=1#L217-L220
Thanks! Is this option available with sgl.Engine
as well (Batched offline mode)
Yes you can use sgl.Engine
with the current simple DP. We are working on the full-fledged version.
What about multi-node? I am currently using Ray with vLLM and it was not that straightforward to do so (Hand-crafted PlacementGroup)
@tsaoyu try this? https://github.com/sgl-project/sglang/pull/2114
We will merge #2114 very soon and release an experimental router which can be downloaded using pip
Checklist
Motivation
I want to host multiple instances of Llama 3 8B across multiple A100 GPUs (with load balancing handled). Is there a way to do this currently?
Related resources
n/a