Open vkc1vk opened 4 days ago
We support a simple data parallelism. Could you try this? https://github.com/sgl-project/sglang/blob/69aa937aa528f0066ab5226bb428cbdf37dec048/README.md?plain=1#L217-L220
Thanks! Is this option available with sgl.Engine
as well (Batched offline mode)
Yes you can use sgl.Engine
with the current simple DP. We are working on the full-fledged version.
Checklist
Motivation
I want to host multiple instances of Llama 3 8B across multiple A100 GPUs (with load balancing handled). Is there a way to do this currently?
Related resources
n/a