vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.05k stars 3.82k forks source link

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

Open ZHJ19970917 opened 1 month ago

ZHJ19970917 commented 1 month ago

Your current environment

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.12.2 | packaged by Anaconda, Inc. | (main, Feb 27 2024, 17:35:02) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-73-generic-x86_64-with-glibc2.35
Is CUDA available: N/A
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA A100-PCIE-40GB
Nvidia driver version: 535.129.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   46 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          80
On-line CPU(s) list:             0-79
Vendor ID:                       GenuineIntel
Model name:                      Intel Xeon Processor (Skylake, IBRS)
CPU family:                      6
Model:                           85
Thread(s) per core:              2
Core(s) per socket:              20
Socket(s):                       2
Stepping:                        4
BogoMIPS:                        5986.22
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ibrs ibpb fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat
L1d cache:                       2.5 MiB (80 instances)
L1i cache:                       2.5 MiB (80 instances)
L2 cache:                        160 MiB (40 instances)
L3 cache:                        32 MiB (2 instances)
NUMA node(s):                    2
NUMA node0 CPU(s):               0-39
NUMA node1 CPU(s):               40-79
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:          Mitigation; IBRS
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown

Versions of relevant libraries:
[pip3] numpy==1.26.4
[conda] numpy                     1.26.4                   pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0-79    0-1             N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

frame #27: _PyEval_EvalFrameDefault + 0x53d6 (0x4f34c6 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #28: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5099ce] frame #29: PyObject_Call + 0xb8 (0x50a508 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #30: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #31: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5099ce] frame #32: PyObject_Call + 0xb8 (0x50a508 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #33: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #34: _PyFunction_Vectorcall + 0x6f (0x4fe0cf in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #35: _PyObject_FastCallDictTstate + 0x17d (0x4f681d in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #36: _PyObject_Call_Prepend + 0x66 (0x507f36 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #37: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5cf883] frame #38: _PyObject_MakeTpCall + 0x25b (0x4f741b in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #39: _PyEval_EvalFrameDefault + 0x5757 (0x4f3847 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #40: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x509b26] frame #41: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #42: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x509b26] frame #43: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #44: _PyObject_FastCallDictTstate + 0xcd (0x4f676d in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #45: _PyObject_Call_Prepend + 0xe0 (0x507fb0 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #46: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5cf883] frame #47: _PyObject_MakeTpCall + 0x25b (0x4f741b in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #48: _PyEval_EvalFrameDefault + 0x4dde (0x4f2ece in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #49: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x509b26] frame #50: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #51: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x509b26] frame #52: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #53: _PyObject_FastCallDictTstate + 0xcd (0x4f676d in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #54: _PyObject_Call_Prepend + 0x66 (0x507f36 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #55: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5cf883] frame #56: _PyObject_MakeTpCall + 0x25b (0x4f741b in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #57: _PyEval_EvalFrameDefault + 0x53d6 (0x4f34c6 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #58: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5099ce] frame #59: PyObject_Call + 0xb8 (0x50a508 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #60: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #61: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5099ce] frame #62: PyObject_Call + 0xb8 (0x50a508 in /root/autodl-tmp/miniconda3/envs/llm/bin/python) frame #63: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi result = await app( # type: ignore[func-returns-value] File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call return await self.app(scope, receive, send) File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in call await self.app(scope, receive, send) File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/routing.py", line 758, in call await self.middleware_stack(scope, receive, send) File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/routing.py", line 778, in app await route.handle(scope, receive, send) File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle await self.app(scope, receive, send) File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/routing.py", line 74, in app response = await func(request) File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app raise e File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app raw_response = await run_endpoint_function( File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(values) File "/root/autodl-tmp/apps/LLaMA-Factory/src/llamafactory/api/app.py", line 85, in create_chat_completion return await create_chat_completion_response(request, chat_model) File "/root/autodl-tmp/apps/LLaMA-Factory/src/llamafactory/api/chat.py", line 132, in create_chat_completion_response responses = await chat_model.achat( File "/root/autodl-tmp/apps/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 56, in achat return await self.engine.chat(messages, system, tools, image, input_kwargs) File "/root/autodl-tmp/apps/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 178, in chat async for request_output in generator: File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 662, in generate async for output in self._process_request( File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 756, in _process_request stream = await self.add_request( File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 561, in add_request self.start_background_loop() File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 431, in start_background_loop raise AsyncEngineDeadError( vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already. INFO: 127.0.0.1:51326 - "GET /.env HTTP/1.1" 404 Not Found

DarkLight1337 commented 1 month ago

We have a tracking issue (https://github.com/vllm-project/vllm/issues/5901) for this. Please provide more details there so we can better troubleshoot the underlying cause.