sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sgl-project.github.io/
Apache License 2.0
6.22k stars 532 forks source link

[Bug] Qwen2-VL-7B IndexError #2181

Open jakep-allenai opened 7 hours ago

jakep-allenai commented 7 hours ago

Checklist

Describe the bug

Occasionally, we will see a random "IndexError" which crashes sglang when serving Qwen2-VL-7B models. The crash is usually such that sglang will livelock, so the process will not exit, but no new requests will be servable.

I have tried to rerun the requests again in a local interactive environment, but I cannot get an exact repro case unfortunately.

2024-11-25T12:44:20.292025261Z 2024-11-25 12:44:20,291 - sglang - INFO - Traceback (most recent call last):
2024-11-25T12:44:20.292027162Z 2024-11-25 12:44:20,291 - sglang - INFO -   File "/usr/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
2024-11-25T12:44:20.292029103Z 2024-11-25 12:44:20,291 - sglang - INFO -     self.run()
2024-11-25T12:44:20.292030905Z 2024-11-25 12:44:20,291 - sglang - INFO -   File "/usr/lib/python3.11/threading.py", line 982, in run
2024-11-25T12:44:20.292036434Z 2024-11-25 12:44:20,291 - sglang - INFO -     self._target(*self._args, **self._kwargs)
2024-11-25T12:44:20.292076086Z 2024-11-25 12:44:20,292 - sglang - INFO -   File "/usr/local/lib/python3.11/dist-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 93, in forward_thread_func
2024-11-25T12:44:20.292096682Z 2024-11-25 12:44:20,292 - sglang - INFO -     self.forward_thread_func_()
2024-11-25T12:44:20.292161264Z 2024-11-25 12:44:20,292 - sglang - INFO -   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2024-11-25T12:44:20.292182642Z 2024-11-25 12:44:20,292 - sglang - INFO -     return func(*args, **kwargs)
2024-11-25T12:44:20.292204657Z 2024-11-25 12:44:20,292 - sglang - INFO -            ^^^^^^^^^^^^^^^^^^^^^
2024-11-25T12:44:20.292255277Z 2024-11-25 12:44:20,292 - sglang - INFO -   File "/usr/local/lib/python3.11/dist-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 120, in forward_thread_func_
2024-11-25T12:44:20.292297851Z 2024-11-25 12:44:20,292 - sglang - INFO -     logits_output, next_token_ids = self.worker.forward_batch_generation(
2024-11-25T12:44:20.292338584Z 2024-11-25 12:44:20,292 - sglang - INFO -                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-25T12:44:20.292369450Z 2024-11-25 12:44:20,292 - sglang - INFO -   File "/usr/local/lib/python3.11/dist-packages/sglang/srt/managers/tp_worker.py", line 147, in forward_batch_generation
2024-11-25T12:44:20.292390642Z 2024-11-25 12:44:20,292 - sglang - INFO -     forward_batch = ForwardBatch.init_new(model_worker_batch, self.model_runner)
2024-11-25T12:44:20.292434844Z 2024-11-25 12:44:20,292 - sglang - INFO -                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-11-25T12:44:20.292452438Z 2024-11-25 12:44:20,292 - sglang - INFO -   File "/usr/local/lib/python3.11/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 266, in init_new
2024-11-25T12:44:20.292496276Z 2024-11-25 12:44:20,292 - sglang - INFO -     ret.compute_mrope_positions(model_runner, batch)
2024-11-25T12:44:20.292534921Z 2024-11-25 12:44:20,292 - sglang - INFO -   File "/usr/local/lib/python3.11/dist-packages/sglang/srt/model_executor/forward_batch_info.py", line 190, in compute_mrope_positions
2024-11-25T12:44:20.292546295Z 2024-11-25 12:44:20,292 - sglang - INFO -     MRotaryEmbedding.get_input_positions(
2024-11-25T12:44:20.292570510Z 2024-11-25 12:44:20,292 - sglang - INFO -   File "/usr/local/lib/python3.11/dist-packages/sglang/srt/layers/rotary_embedding.py", line 48, in get_input_positions
2024-11-25T12:44:20.292595830Z 2024-11-25 12:44:20,292 - sglang - INFO -     image_grid_thw[image_index][0],
2024-11-25T12:44:20.292621161Z 2024-11-25 12:44:20,292 - sglang - INFO -     ~~~~~~~~~~~~~~^^^^^^^^^^^^^
2024-11-25T12:44:20.292645107Z 2024-11-25 12:44:20,292 - sglang - INFO - IndexError: list index out of range

Reproduction

This is using v0.3.6 on an H100.

Environment

/bin/sh: 1: /usr/local/cuda/bin/nvcc: not found
Python: 3.11.10 (main, Oct  3 2024, 07:29:13) [GCC 11.2.0]
CUDA available: True
GPU 0: NVIDIA H100 80GB HBM3
GPU 0 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Not Available
CUDA Driver Version: 525.147.05
PyTorch: 2.5.1+cu124
sglang: 0.3.6
flashinfer: 0.1.6+cu121torch2.4
triton: 3.1.0
transformers: 4.46.3
torchao: 0.6.1
numpy: 1.26.4
aiohttp: 3.10.11
fastapi: 0.115.5
hf_transfer: 0.1.8
huggingface_hub: 0.26.2
interegular: 0.3.3
psutil: 6.1.0
pydantic: 2.10.1
multipart: 0.0.17
zmq: 26.2.0
uvicorn: 0.32.1
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.55.0
anthropic: 0.39.0
NVIDIA Topology: 
        GPU0    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    CPU Affinity    NUMA Affinity
GPU0     X      PIX     NODE    NODE    NODE    SYS     SYS     0-47,96-143     0
NIC0    PIX      X      NODE    NODE    NODE    SYS     SYS
NIC1    NODE    NODE     X      PIX     NODE    SYS     SYS
NIC2    NODE    NODE    PIX      X      NODE    SYS     SYS
NIC3    NODE    NODE    NODE    NODE     X      SYS     SYS
NIC4    SYS     SYS     SYS     SYS     SYS      X      NODE
NIC5    SYS     SYS     SYS     SYS     SYS     NODE     X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_3
  NIC4: mlx5_4
  NIC5: mlx5_5

ulimit soft: 1048576