vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.08k stars 3.82k forks source link

Task finished unexpectedly. when run a wizardcode request #1188

Closed szc900311 closed 11 months ago

szc900311 commented 11 months ago

GPU ENV: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:00:09.0 Off | 0 | | N/A 32C P0 37W / 300W | 31573MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3406559 C python 31561MiB | +-----------------------------------------------------------------------------+

start command :(base) [root@VM-114-86-tencentos ~]# python -m vllm.entrypoints.api_server --model WizardLM/WizardCoder-Python-13B-V1.0 --gpu-memory-utilization 0.95 --host=0.0.0.0

log

INFO 09-27 11:33:57 async_llmengine.py:328] Received request d3ec023c4cab40299a715adf331ac01d: prompt: 'You are an AI unit testing expert. Your task is to help users write high-quality unit test cases by providing guidance on best practices, such as test coverage, test independence, test simplicity, appropriate assertions, avoiding hardcoding, test code maintenance, using testing frameworks, mocking and stubbing, code testability, and continuous integration. Ensure that the suggestions and advice you provide help users create effective and maintainable test cases.\n\nThe user has provided a code snippet in a certain programming language that implements a server for handling various operations. Identify the programming language used in the provided code snippet, and write unit test cases for it, taking into account the specific language and functionality\n\n### Code Snippet:\n\n```go\nfunc (bm *ZippedBitmap) Zip() {\n\tif bm.isZip {\n\t\treturn\n\t}\n\tvar (\n\t\tcursor int8 = -1\n\t\tcursorLen uint32 = 0\n\t\tnewpos uint32 = 0\n\t)\n\tfor , idxV := range bm.data {\n\t\tif idxV == 0x00 {\n\t\t\t// 新值为连续0\n\t\t\tif cursor == 1 {\n\t\t\t\t// 尾端为连续的1, 尾端压缩保存, 新尾端长度为1\n\t\t\t\tnewpos += bm.ZipNode((uint8)(cursor), cursorLen, newpos)\n\t\t\t\tcursorLen = 1\n\t\t\t} else {\n\t\t\t\tcursorLen++\n\t\t\t}\n\t\t\tcursor = 0\n\t\t} else if idxV == 0xff {\n\t\t\t// 新值为连续1\n\t\t\tif cursor == 0 {\n\t\t\t\t// 尾端为连续的0, 尾端压缩保存, 新尾端长度为1\n\t\t\t\tnewpos += bm.ZipNode((uint8)(cursor), cursorLen, newpos)\n\t\t\t\tcursorLen = 1\n\t\t\t} else {\n\t\t\t\tcursorLen++\n\t\t\t}\n\t\t\tcursor = 1\n\t\t} else {\n\t\t\t// 不可压段\n\t\t\tif cursor > -1 {\n\t\t\t\t// 存在\n\t\t\t\tif cursorLen > 1 {\n\t\t\t\t\tnewpos += bm.ZipNode((uint8)(cursor), cursorLen, newpos)\n\t\t\t\t} else {\n\t\t\t\t\tbm.data[newpos] = uint8(0x3f * cursor)\n\t\t\t\t\tnewpos++\n\t\t\t\t}\n\t\t\t\tcursor = -1\n\t\t\t\tcursorLen = 0\n\t\t\t}\n\t\t\tbm.data[newpos] = idxV\n\t\t\tnewpos++\n\t\t}\n\t}\n\tif cursorLen > 0 {\n\t\tnewpos += bm.ZipNode((uint8)(cursor), cursorLen, newpos)\n\t}\n\t// 真实长度\n\tbm.data = bm.data[:newpos]\n\tbm.isZip = true\n}\n```\n\n### Unit Test Cases:', sampling params: SamplingParams(n=4, best_of=4, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.1, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], ignore_eos=False, max_tokens=3000, logprobs=None), prompt token ids: None. INFO 09-27 11:33:58 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 11.3%, CPU KV cache usage: 0.0% INFO 09-27 11:34:03 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 78.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 17.5%, CPU KV cache usage: 0.0% INFO 09-27 11:34:08 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 79.3 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 23.0%, CPU KV cache usage: 0.0% INFO 09-27 11:34:13 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 75.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 28.5%, CPU KV cache usage: 0.0% INFO 09-27 11:34:18 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 79.9 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 34.9%, CPU KV cache usage: 0.0% INFO 09-27 11:34:23 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 73.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 39.5%, CPU KV cache usage: 0.0% INFO 09-27 11:34:28 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 73.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 45.1%, CPU KV cache usage: 0.0% INFO 09-27 11:34:33 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 75.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 50.6%, CPU KV cache usage: 0.0% INFO 09-27 11:34:38 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 66.6 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 55.2%, CPU KV cache usage: 0.0% INFO 09-27 11:34:43 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 72.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 60.7%, CPU KV cache usage: 0.0% INFO 09-27 11:34:48 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 71.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 65.3%, CPU KV cache usage: 0.0% INFO 09-27 11:34:53 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 66.6 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 70.8%, CPU KV cache usage: 0.0% INFO 09-27 11:34:58 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 73.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 75.4%, CPU KV cache usage: 0.0% INFO 09-27 11:35:03 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 68.3 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 80.9%, CPU KV cache usage: 0.0% INFO 09-27 11:35:08 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 69.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 85.5%, CPU KV cache usage: 0.0% INFO 09-27 11:35:13 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 67.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 91.0%, CPU KV cache usage: 0.0% INFO 09-27 11:35:18 llm_engine.py:613] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 66.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 95.6%, CPU KV cache usage: 0.0% Exception in callback _raise_exception_on_finish(request_tracker=)(<Task finishe...this error.')>) at /data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py:21 handle: <Handle _raise_exception_on_finish(request_tracker=)(<Task finishe...this error.')>) at /data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py:21> Traceback (most recent call last): File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 27, in _raise_exception_on_finish task.result() File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 316, in run_engine_loop await self.engine_step() File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 301, in engine_step request_outputs = await self.engine.step_async() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 168, in step_async early_return) = self._schedule() ^^^^^^^^^^^^^^^^ File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 295, in _schedule seq_group_metadata_list, scheduler_outputs = self.scheduler.schedule() ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/anaconda3/lib/python3.11/site-packages/vllm/core/scheduler.py", line 266, in schedule scheduler_outputs = self._schedule() ^^^^^^^^^^^^^^^^ File "/data/anaconda3/lib/python3.11/site-packages/vllm/core/scheduler.py", line 210, in _schedule self._preempt(seq_group, blocks_to_swap_out) File "/data/anaconda3/lib/python3.11/site-packages/vllm/core/scheduler.py", line 344, in _preempt self._preempt_by_swap(seq_group, blocks_to_swap_out) File "/data/anaconda3/lib/python3.11/site-packages/vllm/core/scheduler.py", line 366, in _preempt_by_swap self._swap_out(seq_group, blocks_to_swap_out) File "/data/anaconda3/lib/python3.11/site-packages/vllm/core/scheduler.py", line 387, in _swap_out raise RuntimeError( RuntimeError: Aborted due to the lack of CPU swap space. Please increase the swap space to avoid this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/data/anaconda3/lib/python3.11/asyncio/events.py", line 80, in _run self._context.run(self._callback, *self._args) File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 36, in _raise_exception_on_finish raise exc File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 31, in _raise_exception_on_finish raise AsyncEngineDeadError( vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause. INFO 09-27 11:35:22 async_llm_engine.py:120] Aborted request d3ec023c4cab40299a715adf331ac01d. INFO: 10.91.41.69:52237 - "POST /generate HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 27, in _raise_exception_on_finish task.result() File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 316, in run_engine_loop await self.engine_step() File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 301, in engine_step request_outputs = await self.engine.step_async() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 168, in step_async early_return) = self._schedule() ^^^^^^^^^^^^^^^^ File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 295, in _schedule seq_group_metadata_list, scheduler_outputs = self.scheduler.schedule() ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/anaconda3/lib/python3.11/site-packages/vllm/core/scheduler.py", line 266, in schedule scheduler_outputs = self._schedule() ^^^^^^^^^^^^^^^^ File "/data/anaconda3/lib/python3.11/site-packages/vllm/core/scheduler.py", line 210, in _schedule self._preempt(seq_group, blocks_to_swap_out) File "/data/anaconda3/lib/python3.11/site-packages/vllm/core/scheduler.py", line 344, in _preempt self._preempt_by_swap(seq_group, blocks_to_swap_out) File "/data/anaconda3/lib/python3.11/site-packages/vllm/core/scheduler.py", line 366, in _preempt_by_swap self._swap_out(seq_group, blocks_to_swap_out) File "/data/anaconda3/lib/python3.11/site-packages/vllm/core/scheduler.py", line 387, in _swap_out raise RuntimeError( RuntimeError: Aborted due to the lack of CPU swap space. Please increase the swap space to avoid this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/data/anaconda3/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/anaconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/anaconda3/lib/python3.11/site-packages/fastapi/applications.py", line 292, in call await super().call(scope, receive, send) File "/data/anaconda3/lib/python3.11/site-packages/starlette/applications.py", line 122, in call await self.middleware_stack(scope, receive, send) File "/data/anaconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in call raise exc File "/data/anaconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in call await self.app(scope, receive, _send) File "/data/anaconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in call raise exc File "/data/anaconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in call await self.app(scope, receive, sender) File "/data/anaconda3/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call raise e File "/data/anaconda3/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call await self.app(scope, receive, send) File "/data/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 718, in call await route.handle(scope, receive, send) File "/data/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/data/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 66, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/data/anaconda3/lib/python3.11/site-packages/fastapi/routing.py", line 273, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/anaconda3/lib/python3.11/site-packages/fastapi/routing.py", line 190, in run_endpoint_function return await dependant.call(*values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/anaconda3/lib/python3.11/site-packages/vllm/entrypoints/api_server.py", line 58, in generate async for request_output in results_generator: File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 391, in generate raise e File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 386, in generate async for request_output in stream: File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 69, in anext raise result File "/data/anaconda3/lib/python3.11/asyncio/events.py", line 80, in _run self._context.run(self._callback, self._args) File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 36, in _raise_exception_on_finish raise exc File "/data/anaconda3/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 31, in _raise_exception_on_finish raise AsyncEngineDeadError( vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause. `

esmeetu commented 11 months ago

This because you have not enough swap space. And you should increase it larger. You can run free -h in your system shell and will see current swap space.

szc900311 commented 11 months ago

thx

chi2liu commented 8 months ago

At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any errors. However, the CPU blocks will also become 0, which may slow down the speed a bit, but at least it will not hang and die.