Closed halexan closed 1 month ago
Thanks for reporting this. It has been fixed by https://github.com/sgl-project/sglang/commit/a68cb201dd5f4ae6155b324d22054bbb0de15fba. We also released a new version for this fix. Can you try v0.3.1.post3?
Maybe try to update vllm version, there has been like 10 releases. It is crashing for me also and I just wasted 4 hours on 4xL40S paid GPUs. Basic software like this https://github.com/oobabooga/text-generation-webui/ can utilize GPUs better than vllm plus vllm still does not have optimization for AWQ, GGUF, GPTQ - nothing. They recommend loading unquantized model only. I don't know why is everyone recommending vllm.
Checklist
Describe the bug
Bug report:
Reproduction
Environment
docker image lmsysorg/sglang:v0.3.1.post2-cu124