sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sgl-project.github.io/
Apache License 2.0
6.3k stars 553 forks source link

[Question] RuntimeError: Initialization failed #2193

Open LiYuhang9527 opened 5 days ago

LiYuhang9527 commented 5 days ago

Checklist

Describe the bug

INFO 11-26 14:23:34 weight_utils.py:243] Using model weights format ['*.safetensors'] INFO 11-26 14:23:35 weight_utils.py:288] No model.safetensors.index.json found in remote. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.28it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.28it/s]

[2024-11-26 14:25:58] Initialization failed. warmup error: Traceback (most recent call last): File "/home/liyuhang/miniconda3/envs/sglang/lib/python3.12/site-packages/sglang/srt/server.py", line 579, in _wait_and_warmup assert res.status_code == 200, f"{res=}, {res.text=}" ^^^^^^^^^^^^^^^^^^^^^^ AssertionError: res=<Response [502]>, res.text=''

Traceback (most recent call last): File "/home/liyuhang/sglang-test/local_example_chat.py", line 60, in runtime = sgl.Runtime(model_path="meta-llama/Llama-3.2-1B-Instruct") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/liyuhang/miniconda3/envs/sglang/lib/python3.12/site-packages/sglang/api.py", line 41, in Runtime return Runtime(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/liyuhang/miniconda3/envs/sglang/lib/python3.12/site-packages/sglang/srt/server.py", line 684, in init raise RuntimeError( RuntimeError: Initialization failed. Please see the error messages above.

Reproduction

command: python local_example_chat.py model: meta-llama/Llama-3.2-1B-Instruct

Environment

Python: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct 4 2024, 13:27:36) [GCC 11.2.0] CUDA available: True GPU 0,1: NVIDIA RTX A6000 GPU 0,1 Compute Capability: 8.6 CUDA_HOME: /usr/local/cuda-12.4 NVCC: Cuda compilation tools, release 12.4, V12.4.99 CUDA Driver Version: 550.107.02 PyTorch: 2.4.0 sglang: 0.3.6.post1 flashinfer: 0.1.6+cu124torch2.4 triton: 3.0.0 transformers: 4.46.3 torchao: 0.6.1 numpy: 1.26.4 aiohttp: 3.11.7 fastapi: 0.115.5 hf_transfer: 0.1.8 huggingface_hub: 0.24.6 interegular: 0.3.3 psutil: 6.1.0 pydantic: 2.10.1 multipart: 0.0.17 zmq: 26.2.0 uvicorn: 0.32.1 uvloop: 0.21.0 vllm: 0.6.3.post1 openai: 1.55.1 anthropic: 0.39.0 NVIDIA Topology: GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X NODE 0-15,32-47 0 N/A GPU1 NODE X 0-15,32-47 0 N/A

Legend:

X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks

ulimit soft: 1024

merrymercy commented 12 hours ago

It works for me. Can you try this example? https://sgl-project.github.io/start/send_request.html