sgl-project / sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Apache License 2.0
2.75k stars 177 forks source link

SG-Lang Runtime Stuck Launching in Docker Container #527

Open schopra8 opened 2 weeks ago

schopra8 commented 2 weeks ago

We're trying to run the latest version of sg-lang in a Docker Container (PyTorch 2.3.0, CUDA 12.1) -- but the runtime instantiation gets stuck. It's start loading the model onto the GPU and then hangs.

We've been able to run sg-lang without any problems on the host operating system. So we pip froze the requirements on the host instance and installed these exact packages within the Docker Container -- but we're still hitting this model loading hang.

Has anyone seen this issue before? Any ideas what might be going wrong?

schopra8 commented 2 weeks ago

I've found the line that causes the hang -- but I have no clue why this is a problem:

https://github.com/sgl-project/sglang/blob/542bc733d6ebb6da2554704fc101830a07791584/python/sglang/srt/managers/controller/model_runner.py#L242

I'm running with dp=1 and tp=1 (i.e., on a single GPU). If I run torch.cuda.set_device(0) in my python script -- before I create the Runtime, everything works as expected.