[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.
[X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
[X] 5. Please use English, otherwise it will be closed.
Describe the bug
Occasionally, we will see a random "IndexError" which crashes sglang when serving Qwen2-VL-7B models. The crash is usually such that sglang will livelock, so the process will not exit, but no new requests will be servable.
I have tried to rerun the requests again in a local interactive environment, but I cannot get an exact repro case unfortunately.
2024-11-25T12:44:20.292025261Z 2024-11-25 12:44:20,291 - sglang - INFO - Traceback (most recent call last):
2024-11-25T12:44:20.292027162Z 2024-11-25 12:44:20,291 - sglang - INFO - File "/usr/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
2024-11-25T12:44:20.292029103Z 2024-11-25 12:44:20,291 - sglang - INFO - self.run()
2024-11-25T12:44:20.292030905Z 2024-11-25 12:44:20,291 - sglang - INFO - File "/usr/lib/python3.11/threading.py", line 982, in run
2024-11-25T12:44:20.292036434Z 2024-11-25 12:44:20,291 - sglang - INFO - self._target(*self._args, **self._kwargs)
2024-11-25T12:44:20.292076086Z 2024-11-25 12:44:20,292 - sglang - INFO - File "/usr/local/lib/python3.11/dist-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 93, in forward_thread_func
2024-11-25T12:44:20.292096682Z 2024-11-25 12:44:20,292 - sglang - INFO - self.forward_thread_func_()
2024-11-25T12:44:20.292161264Z 2024-11-25 12:44:20,292 - sglang - INFO - File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2024-11-25T12:44:20.292182642Z 2024-11-25 12:44:20,292 - sglang - INFO - return func(*args, **kwargs)
2024-11-25T12:44:20.292204657Z 2024-11-25 12:44:20,292 - sglang - INFO - ^^^^^^^^^^^^^^^^^^^^^
2024-11-25T12:44:20.292255277Z 2024-11-25 12:44:20,292 - sglang - INFO - File "/usr/local/lib/python3.11/dist-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 120, in forward_thread_func_
2024-11-25T12:44:20.292297851Z 2024-11-25 12:44:20,292 - sglang - INFO - logits_output, next_token_ids = self.worker.forward_batch_generation(
2024-11-25T12:44:20.292338584Z 2024-11-25 12:44:20,292 - sglang - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-25T12:44:20.292369450Z 2024-11-25 12:44:20,292 - sglang - INFO - File "/usr/local/lib/python3.11/dist-packages/sglang/srt/managers/tp_worker.py", line 147, in forward_batch_generation
2024-11-25T12:44:20.292390642Z 2024-11-25 12:44:20,292 - sglang - INFO - forward_batch = ForwardBatch.init_new(model_worker_batch, self.model_runner)
2024-11-25T12:44:20.292434844Z 2024-11-25 12:44:20,292 - sglang - INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-25T12:44:20.292452438Z 2024-11-25 12:44:20,292 - sglang - INFO - File "/usr/local/lib/python3.11/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 266, in init_new
2024-11-25T12:44:20.292496276Z 2024-11-25 12:44:20,292 - sglang - INFO - ret.compute_mrope_positions(model_runner, batch)
2024-11-25T12:44:20.292534921Z 2024-11-25 12:44:20,292 - sglang - INFO - File "/usr/local/lib/python3.11/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 190, in compute_mrope_positions
2024-11-25T12:44:20.292546295Z 2024-11-25 12:44:20,292 - sglang - INFO - MRotaryEmbedding.get_input_positions(
2024-11-25T12:44:20.292570510Z 2024-11-25 12:44:20,292 - sglang - INFO - File "/usr/local/lib/python3.11/dist-packages/sglang/srt/layers/rotary_embedding.py", line 48, in get_input_positions
2024-11-25T12:44:20.292595830Z 2024-11-25 12:44:20,292 - sglang - INFO - image_grid_thw[image_index][0],
2024-11-25T12:44:20.292621161Z 2024-11-25 12:44:20,292 - sglang - INFO - ~~~~~~~~~~~~~~^^^^^^^^^^^^^
2024-11-25T12:44:20.292645107Z 2024-11-25 12:44:20,292 - sglang - INFO - IndexError: list index out of range
Reproduction
This is using v0.3.6 on an H100.
Environment
/bin/sh: 1: /usr/local/cuda/bin/nvcc: not found
Python: 3.11.10 (main, Oct 3 2024, 07:29:13) [GCC 11.2.0]
CUDA available: True
GPU 0: NVIDIA H100 80GB HBM3
GPU 0 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Not Available
CUDA Driver Version: 525.147.05
PyTorch: 2.5.1+cu124
sglang: 0.3.6
flashinfer: 0.1.6+cu121torch2.4
triton: 3.1.0
transformers: 4.46.3
torchao: 0.6.1
numpy: 1.26.4
aiohttp: 3.10.11
fastapi: 0.115.5
hf_transfer: 0.1.8
huggingface_hub: 0.26.2
interegular: 0.3.3
psutil: 6.1.0
pydantic: 2.10.1
multipart: 0.0.17
zmq: 26.2.0
uvicorn: 0.32.1
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.55.0
anthropic: 0.39.0
NVIDIA Topology:
GPU0 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 CPU Affinity NUMA Affinity
GPU0 X PIX NODE NODE NODE SYS SYS 0-47,96-143 0
NIC0 PIX X NODE NODE NODE SYS SYS
NIC1 NODE NODE X PIX NODE SYS SYS
NIC2 NODE NODE PIX X NODE SYS SYS
NIC3 NODE NODE NODE NODE X SYS SYS
NIC4 SYS SYS SYS SYS SYS X NODE
NIC5 SYS SYS SYS SYS SYS NODE X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
ulimit soft: 1048576
Checklist
Describe the bug
Occasionally, we will see a random "IndexError" which crashes sglang when serving Qwen2-VL-7B models. The crash is usually such that sglang will livelock, so the process will not exit, but no new requests will be servable.
I have tried to rerun the requests again in a local interactive environment, but I cannot get an exact repro case unfortunately.
Reproduction
This is using v0.3.6 on an H100.
Environment