[X] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Python: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct 4 2024, 13:27:36) [GCC 11.2.0]
CUDA available: True
GPU 0,1: NVIDIA RTX A6000
GPU 0,1 Compute Capability: 8.6
CUDA_HOME: /usr/local/cuda-12.4
NVCC: Cuda compilation tools, release 12.4, V12.4.99
CUDA Driver Version: 550.107.02
PyTorch: 2.4.0
sglang: 0.3.6.post1
flashinfer: 0.1.6+cu124torch2.4
triton: 3.0.0
transformers: 4.46.3
torchao: 0.6.1
numpy: 1.26.4
aiohttp: 3.11.7
fastapi: 0.115.5
hf_transfer: 0.1.8
huggingface_hub: 0.24.6
interegular: 0.3.3
psutil: 6.1.0
pydantic: 2.10.1
multipart: 0.0.17
zmq: 26.2.0
uvicorn: 0.32.1
uvloop: 0.21.0
vllm: 0.6.3.post1
openai: 1.55.1
anthropic: 0.39.0
NVIDIA Topology:
GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NODE 0-15,32-47 0 N/A
GPU1 NODE X 0-15,32-47 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
Checklist
Describe the bug
INFO 11-26 14:23:34 weight_utils.py:243] Using model weights format ['*.safetensors'] INFO 11-26 14:23:35 weight_utils.py:288] No model.safetensors.index.json found in remote. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.28it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.28it/s]
[2024-11-26 14:25:58] Initialization failed. warmup error: Traceback (most recent call last): File "/home/liyuhang/miniconda3/envs/sglang/lib/python3.12/site-packages/sglang/srt/server.py", line 579, in _wait_and_warmup assert res.status_code == 200, f"{res=}, {res.text=}" ^^^^^^^^^^^^^^^^^^^^^^ AssertionError: res=<Response [502]>, res.text=''
Traceback (most recent call last): File "/home/liyuhang/sglang-test/local_example_chat.py", line 60, in
runtime = sgl.Runtime(model_path="meta-llama/Llama-3.2-1B-Instruct")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/liyuhang/miniconda3/envs/sglang/lib/python3.12/site-packages/sglang/api.py", line 41, in Runtime
return Runtime(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/liyuhang/miniconda3/envs/sglang/lib/python3.12/site-packages/sglang/srt/server.py", line 684, in init
raise RuntimeError(
RuntimeError: Initialization failed. Please see the error messages above.
Reproduction
command: python local_example_chat.py model: meta-llama/Llama-3.2-1B-Instruct
Environment
Python: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct 4 2024, 13:27:36) [GCC 11.2.0] CUDA available: True GPU 0,1: NVIDIA RTX A6000 GPU 0,1 Compute Capability: 8.6 CUDA_HOME: /usr/local/cuda-12.4 NVCC: Cuda compilation tools, release 12.4, V12.4.99 CUDA Driver Version: 550.107.02 PyTorch: 2.4.0 sglang: 0.3.6.post1 flashinfer: 0.1.6+cu124torch2.4 triton: 3.0.0 transformers: 4.46.3 torchao: 0.6.1 numpy: 1.26.4 aiohttp: 3.11.7 fastapi: 0.115.5 hf_transfer: 0.1.8 huggingface_hub: 0.24.6 interegular: 0.3.3 psutil: 6.1.0 pydantic: 2.10.1 multipart: 0.0.17 zmq: 26.2.0 uvicorn: 0.32.1 uvloop: 0.21.0 vllm: 0.6.3.post1 openai: 1.55.1 anthropic: 0.39.0 NVIDIA Topology: GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X NODE 0-15,32-47 0 N/A GPU1 NODE X 0-15,32-47 0 N/A
Legend:
X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks
ulimit soft: 1024