Open heya5 opened 1 month ago
+1.
I'm experiencing the same error when gathering tasks.
My full error message is:
INFO 08-16 18:14:22 llm_engine.py:175] Initializing an LLM engine (v0.5.2) with config: model='meta-llama/Meta-Llama-3-8B-Instruct', speculative_config=None, tokenizer='meta-llama/Meta-Llama-3-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cpu, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=meta-llama/Meta-Llama-3-8B-Instruct, use_v2_block_manager=False, enable_prefix_caching=False)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 08-16 18:14:22 cpu_executor.py:134] CUDA graph is not supported on CPU, fallback to the eager mode.
WARNING 08-16 18:14:22 cpu_executor.py:161] Environment variable VLLM_CPU_KVCACHE_SPACE (GB) for CPU backend is not set, using 4 by default.
INFO 08-16 18:14:23 selector.py:117] Cannot use _Backend.FLASH_ATTN backend on CPU.
INFO 08-16 18:14:23 selector.py:66] Using Torch SDPA backend.
INFO 08-16 18:14:23 selector.py:117] Cannot use _Backend.FLASH_ATTN backend on CPU.
INFO 08-16 18:14:23 selector.py:66] Using Torch SDPA backend.
INFO 08-16 18:14:24 weight_utils.py:219] Using model weights format ['*.safetensors']
INFO 08-16 18:15:58 cpu_executor.py:74] # CPU blocks: 2048
INFO 08-16 18:16:00 async_llm_engine.py:670] Received request 8c5a8b9611a34bb2b4dfa010e1ea2cb1: prompt: 'A robot may not injure a human being', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.2, top_p=0.95, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
INFO 08-16 18:16:00 async_llm_engine.py:670] Received request 8c5a8b9611a34bb2b4dfa010e1ea2cb1: prompt: 'A robot may not injure a human being', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.2, top_p=0.95, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
INFO 08-16 18:16:00 async_llm_engine.py:670] Received request 9e711ba716054ec6ad8b5cb077266b1a: prompt: 'To be or not to be, finish this poem', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.2, top_p=0.95, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
INFO 08-16 18:16:00 async_llm_engine.py:670] Received request 9e711ba716054ec6ad8b5cb077266b1a: prompt: 'To be or not to be, finish this poem', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.2, top_p=0.95, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
INFO 08-16 18:16:00 async_llm_engine.py:670] Received request 5e3e4c1e45574baf8b85d660e90bc996: prompt: 'What is the meaning of life?', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.2, top_p=0.95, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
INFO 08-16 18:16:00 async_llm_engine.py:670] Received request 5e3e4c1e45574baf8b85d660e90bc996: prompt: 'What is the meaning of life?', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.2, top_p=0.95, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
INFO 08-16 18:16:21 metrics.py:396] Avg prompt throughput: 2.7 tokens/s, Avg generation throughput: -1.1 tokens/s, Running: 6 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%.
ERROR 08-16 18:16:21 async_llm_engine.py:55] Engine background task failed
ERROR 08-16 18:16:21 async_llm_engine.py:55] Traceback (most recent call last):
ERROR 08-16 18:16:21 async_llm_engine.py:55] File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 45, in _log_task_completion
ERROR 08-16 18:16:21 async_llm_engine.py:55] return_value = task.result()
ERROR 08-16 18:16:21 async_llm_engine.py:55] File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 618, in run_engine_loop
ERROR 08-16 18:16:21 async_llm_engine.py:55] result = task.result()
ERROR 08-16 18:16:21 async_llm_engine.py:55] File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 561, in engine_step
ERROR 08-16 18:16:21 async_llm_engine.py:55] request_outputs = await self.engine.step_async(virtual_engine)
ERROR 08-16 18:16:21 async_llm_engine.py:55] File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 253, in step_async
ERROR 08-16 18:16:21 async_llm_engine.py:55] self.do_log_stats(scheduler_outputs, output)
ERROR 08-16 18:16:21 async_llm_engine.py:55] File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 938, in do_log_stats
ERROR 08-16 18:16:21 async_llm_engine.py:55] logger.log(self._get_stats(scheduler_outputs, model_output))
ERROR 08-16 18:16:21 async_llm_engine.py:55] File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 530, in log
ERROR 08-16 18:16:21 async_llm_engine.py:55] self._log_prometheus(stats)
ERROR 08-16 18:16:21 async_llm_engine.py:55] File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 487, in _log_prometheus
ERROR 08-16 18:16:21 async_llm_engine.py:55] self._log_counter(self.metrics.counter_generation_tokens,
ERROR 08-16 18:16:21 async_llm_engine.py:55] File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 455, in _log_counter
ERROR 08-16 18:16:21 async_llm_engine.py:55] counter.labels(**self.labels).inc(data)
ERROR 08-16 18:16:21 async_llm_engine.py:55] File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/prometheus_client/metrics.py", line 313, in inc
ERROR 08-16 18:16:21 async_llm_engine.py:55] raise ValueError('Counters can only be incremented by non-negative amounts.')
ERROR 08-16 18:16:21 async_llm_engine.py:55] ValueError: Counters can only be incremented by non-negative amounts.
Exception in callback _log_task_completion(error_callback=<bound method...7fb4573ef6a0>>)(<Task finishe...ve amounts.')>) at /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py:35
handle: <Handle _log_task_completion(error_callback=<bound method...7fb4573ef6a0>>)(<Task finishe...ve amounts.')>) at /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py:35>
Traceback (most recent call last):
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 45, in _log_task_completion
return_value = task.result()
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 618, in run_engine_loop
result = task.result()
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 561, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 253, in step_async
self.do_log_stats(scheduler_outputs, output)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 938, in do_log_stats
logger.log(self._get_stats(scheduler_outputs, model_output))
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 530, in log
self._log_prometheus(stats)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 487, in _log_prometheus
self._log_counter(self.metrics.counter_generation_tokens,
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 455, in _log_counter
counter.labels(**self.labels).inc(data)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/prometheus_client/metrics.py", line 313, in inc
raise ValueError('Counters can only be incremented by non-negative amounts.')
ValueError: Counters can only be incremented by non-negative amounts.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 57, in _log_task_completion
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for theactual cause.
INFO 08-16 18:16:21 async_llm_engine.py:170] Aborted request 8c5a8b9611a34bb2b4dfa010e1ea2cb1.
INFO 08-16 18:16:21 async_llm_engine.py:170] Aborted request 9e711ba716054ec6ad8b5cb077266b1a.
INFO 08-16 18:16:21 async_llm_engine.py:170] Aborted request 5e3e4c1e45574baf8b85d660e90bc996.
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/search-research/code/Users/grace.qin/search/src/humaneval/async_run.py", line 82, in <module>
[rank0]: responses = asyncio.run(run_engine(llm_engine, sampling_params, prompts))
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/asyncio/runners.py", line 44, in run
[rank0]: return loop.run_until_complete(main)
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
[rank0]: return future.result()
[rank0]: File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/search-research/code/Users/grace.qin/search/src/humaneval/async_run.py", line 73, in run_engine
[rank0]: responses = await asyncio.gather(*(process_requests(llm_engine, sampling_params, prompt) for prompt in prompts))
[rank0]: File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/search-research/code/Users/grace.qin/search/src/humaneval/async_run.py", line 59, in process_requests
[rank0]: async for request_output in results_generator:
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 773, in generate
[rank0]: async for output in self._process_request(
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 889, in _process_request
[rank0]: raise e
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 885, in _process_request
[rank0]: async for request_output in stream:
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 92, in __anext__
[rank0]: raise result
[rank0]: File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/search-research/code/Users/grace.qin/search/src/humaneval/async_run.py", line 59, in process_requests
[rank0]: async for request_output in results_generator:
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 773, in generate
[rank0]: async for output in self._process_request(
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 889, in _process_request
[rank0]: raise e
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 885, in _process_request
[rank0]: async for request_output in stream:
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 92, in __anext__
[rank0]: raise result
[rank0]: File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/search-research/code/Users/grace.qin/search/src/humaneval/async_run.py", line 59, in process_requests
[rank0]: async for request_output in results_generator:
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 773, in generate
[rank0]: async for output in self._process_request(
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 889, in _process_request
[rank0]: raise e
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 885, in _process_request
[rank0]: async for request_output in stream:
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 92, in __anext__
[rank0]: raise result
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 45, in _log_task_completion
[rank0]: return_value = task.result()
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 618, in run_engine_loop
[rank0]: result = task.result()
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 561, in engine_step
[rank0]: request_outputs = await self.engine.step_async(virtual_engine)
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 253, in step_async
[rank0]: self.do_log_stats(scheduler_outputs, output)
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 938, in do_log_stats
[rank0]: logger.log(self._get_stats(scheduler_outputs, model_output))
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 530, in log
[rank0]: self._log_prometheus(stats)
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 487, in _log_prometheus
[rank0]: self._log_counter(self.metrics.counter_generation_tokens,
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 455, in _log_counter
[rank0]: counter.labels(**self.labels).inc(data)
[rank0]: File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/prometheus_client/metrics.py", line 313, in inc
[rank0]: raise ValueError('Counters can only be incremented by non-negative amounts.')
[rank0]: ValueError: Counters can only be incremented by non-negative amounts.
Your current environment
I got the error in vllm server when I use
results = await asyncio.gather(*tasks)
in client.🐛 Describe the bug
Here is the error log: