vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.69k stars 4.48k forks source link

[Bug]: crash:RecursionError: maximum recursion depth exceeded #9608

Open wciq1208 opened 2 weeks ago

wciq1208 commented 2 weeks ago

Your current environment

The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.4.0 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.26.4 Libc version: glibc-2.35 Python version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-3.10.0-1160.71.1.el7.x86_64-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 GPU 1: NVIDIA GeForce RTX 3090 Nvidia driver version: 535.129.03 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: GenuineIntel BIOS Vendor ID: Red Hat Model name: Intel(R) Xeon(R) Gold 6140M CPU @ 2.30GHz BIOS Model name: RHEL 7.6.0 PC (i440FX + PIIX, 1996) CPU family: 6 Model: 85 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 4 Stepping: 4 BogoMIPS: 4599.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat umip pku ospke md_clear spec_ctrl intel_stibp arch_capabilities Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 512 KiB (16 instances) L1i cache: 512 KiB (16 instances) L2 cache: 64 MiB (16 instances) L3 cache: 64 MiB (4 instances) NUMA node(s): 1 NUMA node0 CPU(s): 0-15 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled Vulnerability Mds: Mitigation; Clear CPU buffers; SMT Host state unknown Vulnerability Meltdown: Mitigation; PTI Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; Load fences, usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; IBRS (kernel), IBPB Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown Versions of relevant libraries: [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.26.4 [pip3] nvidia-ml-py==12.560.30 [pip3] onnxruntime==1.16.3 [pip3] optree==0.12.1 [pip3] pyzmq==26.2.0 [pip3] sentence-transformers==3.0.1 [pip3] torch==2.4.0 [pip3] torchaudio==2.4.0 [pip3] torchelastic==0.2.2 [pip3] torchvision==0.19.0 [pip3] transformers==4.45.2 [pip3] transformers-stream-generator==0.0.4 [pip3] triton==3.0.0 [conda] blas 1.0 mkl [conda] cuda-cudart 12.1.105 0 nvidia [conda] cuda-cupti 12.1.105 0 nvidia [conda] cuda-libraries 12.1.0 0 nvidia [conda] cuda-nvrtc 12.1.105 0 nvidia [conda] cuda-nvtx 12.1.105 0 nvidia [conda] cuda-opencl 12.5.39 0 nvidia [conda] cuda-runtime 12.1.0 0 nvidia [conda] cuda-version 12.5 3 nvidia [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] libcublas 12.1.0.26 0 nvidia [conda] libcufft 11.0.2.4 0 nvidia [conda] libcufile 1.10.1.7 0 nvidia [conda] libcurand 10.3.6.82 0 nvidia [conda] libcusolver 11.4.4.55 0 nvidia [conda] libcusparse 12.0.2.55 0 nvidia [conda] libjpeg-turbo 2.0.0 h9bf148f_0 pytorch [conda] libnpp 12.0.2.50 0 nvidia [conda] libnvjitlink 12.1.105 0 nvidia [conda] libnvjpeg 12.1.1.14 0 nvidia [conda] mkl 2023.1.0 h213fc3f_46344 [conda] mkl-service 2.4.0 py311h5eee18b_1 [conda] mkl_fft 1.3.8 py311h5eee18b_0 [conda] mkl_random 1.2.4 py311hdb19cb5_0 [conda] numpy 1.26.4 py311h08b1b3b_0 [conda] numpy-base 1.26.4 py311hf175353_0 [conda] nvidia-ml-py 12.560.30 pypi_0 pypi [conda] optree 0.12.1 pypi_0 pypi [conda] pytorch 2.4.0 py3.11_cuda12.1_cudnn9.1.0_0 pytorch [conda] pytorch-cuda 12.1 ha16c6d3_5 pytorch [conda] pytorch-mutex 1.0 cuda pytorch [conda] pyzmq 26.2.0 pypi_0 pypi [conda] sentence-transformers 3.0.1 pypi_0 pypi [conda] torchaudio 2.4.0 py311_cu121 pytorch [conda] torchelastic 0.2.2 pypi_0 pypi [conda] torchtriton 3.0.0 py311 pytorch [conda] torchvision 0.19.0 py311_cu121 pytorch [conda] transformers 4.45.2 pypi_0 pypi [conda] transformers-stream-generator 0.0.4 pypi_0 pypi ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.6.3.post1 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X PHB 0-15 0 N/A GPU1 PHB X 0-15 0 N/A Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks ```

Model Input Dumps

No response

🐛 Describe the bug

command:

vllm serve /hestia/model/Qwen2.5-14B-Instruct-AWQ --max-model-len 32768 --quantization awq --port 8001 --swap-space 0 --served-model-name qwen --num-gpu-blocks-override 2048
INFO:     127.0.0.1:42158 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR 10-23 07:25:31 engine.py:158] RecursionError('maximum recursion depth exceeded')
ERROR 10-23 07:25:31 engine.py:158] Traceback (most recent call last):
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 156, in start
ERROR 10-23 07:25:31 engine.py:158]     self.run_engine_loop()
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 219, in run_engine_loop
ERROR 10-23 07:25:31 engine.py:158]     request_outputs = self.engine_step()
ERROR 10-23 07:25:31 engine.py:158]                       ^^^^^^^^^^^^^^^^^^
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 237, in engine_step
ERROR 10-23 07:25:31 engine.py:158]     raise e
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 228, in engine_step
ERROR 10-23 07:25:31 engine.py:158]     return self.engine.step()
ERROR 10-23 07:25:31 engine.py:158]            ^^^^^^^^^^^^^^^^^^
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 1438, in step
ERROR 10-23 07:25:31 engine.py:158]     self._process_model_outputs(ctx=ctx)
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 1124, in _process_model_outputs
ERROR 10-23 07:25:31 engine.py:158]     self.output_processor.process_outputs(
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/engine/output_processor/single_step.py", line 96, in process_outputs
ERROR 10-23 07:25:31 engine.py:158]     return self._process_sequence_group_outputs(sequence_group, outputs[0],
ERROR 10-23 07:25:31 engine.py:158]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/engine/output_processor/single_step.py", line 207, in _process_sequence_group_outputs
ERROR 10-23 07:25:31 engine.py:158]     scheduler.fork_seq(parent, seq)
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/core/scheduler.py", line 1366, in fork_seq
ERROR 10-23 07:25:31 engine.py:158]     self.block_manager.fork(parent_seq, child_seq)
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/core/block_manager.py", line 332, in fork
ERROR 10-23 07:25:31 engine.py:158]     self.block_tables[child_seq.seq_id] = src_block_table.fork()
ERROR 10-23 07:25:31 engine.py:158]                                           ^^^^^^^^^^^^^^^^^^^^^^
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/core/block/block_table.py", line 207, in fork
ERROR 10-23 07:25:31 engine.py:158]     forked_blocks = self._allocator.fork(self._blocks[-1])
ERROR 10-23 07:25:31 engine.py:158]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/core/block/cpu_gpu_block_allocator.py", line 203, in fork
ERROR 10-23 07:25:31 engine.py:158]     return allocator.fork(last_block)
ERROR 10-23 07:25:31 engine.py:158]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/core/block/naive_block.py", line 165, in fork
ERROR 10-23 07:25:31 engine.py:158]     source_blocks = get_all_blocks_recursively(last_block)
ERROR 10-23 07:25:31 engine.py:158]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/core/block/common.py", line 359, in get_all_blocks_recursively
ERROR 10-23 07:25:31 engine.py:158]     recurse(last_block, all_blocks)
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/core/block/common.py", line 355, in recurse
ERROR 10-23 07:25:31 engine.py:158]     recurse(block.prev_block, lst)
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/core/block/common.py", line 355, in recurse
ERROR 10-23 07:25:31 engine.py:158]     recurse(block.prev_block, lst)
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/core/block/common.py", line 355, in recurse
ERROR 10-23 07:25:31 engine.py:158]     recurse(block.prev_block, lst)
ERROR 10-23 07:25:31 engine.py:158]   [Previous line repeated 977 more times]
ERROR 10-23 07:25:31 engine.py:158]   File "/opt/conda/lib/python3.11/site-packages/vllm/core/block/common.py", line 354, in recurse
ERROR 10-23 07:25:31 engine.py:158]     if block.prev_block is not None:
ERROR 10-23 07:25:31 engine.py:158]        ^^^^^^^^^^^^^^^^
ERROR 10-23 07:25:31 engine.py:158] RecursionError: maximum recursion depth exceeded
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1]

Before submitting a new issue...

wciq1208 commented 1 week ago

Is anyone looking into this issue? It seems like it might be caused by Block Manager V2. @DarkLight1337

DarkLight1337 commented 1 week ago

cc @WoosukKwon

jianshuod commented 4 days ago

As a workaround, downgrading to v0.6.0 works for me.

JimXiongGM commented 13 hours ago

Same. version: v0.6.3.post1

ERROR 11-06 16:00:35 engine.py:158] RecursionError('maximum recursion depth exceeded')
ERROR 11-06 16:00:35 engine.py:158] Traceback (most recent call last):
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 156, in start
ERROR 11-06 16:00:35 engine.py:158]     self.run_engine_loop()
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 219, in run_engine_loop
ERROR 11-06 16:00:35 engine.py:158]     request_outputs = self.engine_step()
ERROR 11-06 16:00:35 engine.py:158]                       ^^^^^^^^^^^^^^^^^^
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 237, in engine_step
ERROR 11-06 16:00:35 engine.py:158]     raise e
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 228, in engine_step
ERROR 11-06 16:00:35 engine.py:158]     return self.engine.step()
ERROR 11-06 16:00:35 engine.py:158]            ^^^^^^^^^^^^^^^^^^
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 1438, in step
ERROR 11-06 16:00:35 engine.py:158]     self._process_model_outputs(ctx=ctx)
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 1124, in _process_model_outputs
ERROR 11-06 16:00:35 engine.py:158]     self.output_processor.process_outputs(
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/engine/output_processor/single_step.py", line 96, in process_outputs
ERROR 11-06 16:00:35 engine.py:158]     return self._process_sequence_group_outputs(sequence_group, outputs[0],
ERROR 11-06 16:00:35 engine.py:158]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/engine/output_processor/single_step.py", line 207, in _process_sequence_group_outputs
ERROR 11-06 16:00:35 engine.py:158]     scheduler.fork_seq(parent, seq)
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/core/scheduler.py", line 1366, in fork_seq
ERROR 11-06 16:00:35 engine.py:158]     self.block_manager.fork(parent_seq, child_seq)
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/core/block_manager.py", line 332, in fork
ERROR 11-06 16:00:35 engine.py:158]     self.block_tables[child_seq.seq_id] = src_block_table.fork()
ERROR 11-06 16:00:35 engine.py:158]                                           ^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/core/block/block_table.py", line 207, in fork
ERROR 11-06 16:00:35 engine.py:158]     forked_blocks = self._allocator.fork(self._blocks[-1])
ERROR 11-06 16:00:35 engine.py:158]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/core/block/cpu_gpu_block_allocator.py", line 203, in fork
ERROR 11-06 16:00:35 engine.py:158]     return allocator.fork(last_block)
ERROR 11-06 16:00:35 engine.py:158]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/core/block/prefix_caching_block.py", line 364, in fork
ERROR 11-06 16:00:35 engine.py:158]     source_blocks = get_all_blocks_recursively(last_block)
ERROR 11-06 16:00:35 engine.py:158]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/core/block/common.py", line 359, in get_all_blocks_recursively
ERROR 11-06 16:00:35 engine.py:158]     recurse(last_block, all_blocks)
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/core/block/common.py", line 355, in recurse
ERROR 11-06 16:00:35 engine.py:158]     recurse(block.prev_block, lst)
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/core/block/common.py", line 355, in recurse
ERROR 11-06 16:00:35 engine.py:158]     recurse(block.prev_block, lst)
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/core/block/common.py", line 355, in recurse
ERROR 11-06 16:00:35 engine.py:158]     recurse(block.prev_block, lst)
ERROR 11-06 16:00:35 engine.py:158]   [Previous line repeated 977 more times]
ERROR 11-06 16:00:35 engine.py:158]   File "/home/test/miniconda3/lib/python3.12/site-packages/vllm/core/block/common.py", line 354, in recurse
ERROR 11-06 16:00:35 engine.py:158]     if block.prev_block is not None:
ERROR 11-06 16:00:35 engine.py:158]        ^^^^^^^^^^^^^^^^
ERROR 11-06 16:00:35 engine.py:158] RecursionError: maximum recursion depth exceeded