vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.56k stars 3.89k forks source link

[Bug]: error `Counters can only be incremented by non-negative amounts.` in `metrics` module #6642

Open heya5 opened 1 month ago

heya5 commented 1 month ago

Your current environment

I got the error in vllm server when I use results = await asyncio.gather(*tasks) in client.

🐛 Describe the bug

Here is the error log:

ERROR 07-22 09:54:47 async_llm_engine.py:52] Engine background task failed
ERROR 07-22 09:54:47 async_llm_engine.py:52] Traceback (most recent call last):
ERROR 07-22 09:54:47 async_llm_engine.py:52]   File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion
ERROR 07-22 09:54:47 async_llm_engine.py:52]     return_value = task.result()
ERROR 07-22 09:54:47 async_llm_engine.py:52]   File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop
ERROR 07-22 09:54:47 async_llm_engine.py:52]     has_requests_in_progress = await asyncio.wait_for(
ERROR 07-22 09:54:47 async_llm_engine.py:52]   File "/home/a100user/miniconda3/envs/glm/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
ERROR 07-22 09:54:47 async_llm_engine.py:52]     return fut.result()
ERROR 07-22 09:54:47 async_llm_engine.py:52]   File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step
ERROR 07-22 09:54:47 async_llm_engine.py:52]     request_outputs = await self.engine.step_async()
ERROR 07-22 09:54:47 async_llm_engine.py:52]   File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 245, in step_async
ERROR 07-22 09:54:47 async_llm_engine.py:52]     self.do_log_stats(scheduler_outputs, output)
ERROR 07-22 09:54:47 async_llm_engine.py:52]   File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 804, in do_log_stats
ERROR 07-22 09:54:47 async_llm_engine.py:52]     self.stat_logger.log(
ERROR 07-22 09:54:47 async_llm_engine.py:52]   File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/metrics.py", line 322, in log
ERROR 07-22 09:54:47 async_llm_engine.py:52]     self._log_prometheus(stats)
ERROR 07-22 09:54:47 async_llm_engine.py:52]   File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/metrics.py", line 256, in _log_prometheus
ERROR 07-22 09:54:47 async_llm_engine.py:52]     self._log_counter(self.metrics.counter_generation_tokens,
ERROR 07-22 09:54:47 async_llm_engine.py:52]   File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/metrics.py", line 288, in _log_counter
ERROR 07-22 09:54:47 async_llm_engine.py:52]     counter.labels(**self.labels).inc(data)
ERROR 07-22 09:54:47 async_llm_engine.py:52]   File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/prometheus_client/metrics.py", line 313, in inc
ERROR 07-22 09:54:47 async_llm_engine.py:52]     raise ValueError('Counters can only be incremented by non-negative amounts.')
ERROR 07-22 09:54:47 async_llm_engine.py:52] ValueError: Counters can only be incremented by non-negative amounts.
Exception in callback functools.partial(<function _log_task_completion at 0x7f9e82a6e7a0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f9e7dfea380>>)
handle: <Handle functools.partial(<function _log_task_completion at 0x7f9e82a6e7a0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f9e7dfea380>>)>
Traceback (most recent call last):
  File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion
    return_value = task.result()
  File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
  File "/home/a100user/miniconda3/envs/glm/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step
    request_outputs = await self.engine.step_async()
  File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 245, in step_async
    self.do_log_stats(scheduler_outputs, output)
  File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 804, in do_log_stats
    self.stat_logger.log(
  File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/metrics.py", line 322, in log
    self._log_prometheus(stats)
  File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/metrics.py", line 256, in _log_prometheus
    self._log_counter(self.metrics.counter_generation_tokens,
  File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/metrics.py", line 288, in _log_counter
    counter.labels(**self.labels).inc(data)
  File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/prometheus_client/metrics.py", line 313, in inc
    raise ValueError('Counters can only be incremented by non-negative amounts.')
ValueError: Counters can only be incremented by non-negative amounts.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/home/a100user/miniconda3/envs/glm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for theactual cause.
gracequeen commented 3 weeks ago

+1.

I'm experiencing the same error when gathering tasks.

gracequeen commented 3 weeks ago

My full error message is:

INFO 08-16 18:14:22 llm_engine.py:175] Initializing an LLM engine (v0.5.2) with config: model='meta-llama/Meta-Llama-3-8B-Instruct', speculative_config=None, tokenizer='meta-llama/Meta-Llama-3-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cpu, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=meta-llama/Meta-Llama-3-8B-Instruct, use_v2_block_manager=False, enable_prefix_caching=False)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 08-16 18:14:22 cpu_executor.py:134] CUDA graph is not supported on CPU, fallback to the eager mode.
WARNING 08-16 18:14:22 cpu_executor.py:161] Environment variable VLLM_CPU_KVCACHE_SPACE (GB) for CPU backend is not set, using 4 by default.
INFO 08-16 18:14:23 selector.py:117] Cannot use _Backend.FLASH_ATTN backend on CPU.
INFO 08-16 18:14:23 selector.py:66] Using Torch SDPA backend.
INFO 08-16 18:14:23 selector.py:117] Cannot use _Backend.FLASH_ATTN backend on CPU.
INFO 08-16 18:14:23 selector.py:66] Using Torch SDPA backend.
INFO 08-16 18:14:24 weight_utils.py:219] Using model weights format ['*.safetensors']
INFO 08-16 18:15:58 cpu_executor.py:74] # CPU blocks: 2048
INFO 08-16 18:16:00 async_llm_engine.py:670] Received request 8c5a8b9611a34bb2b4dfa010e1ea2cb1: prompt: 'A robot may not injure a human being', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.2, top_p=0.95, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
INFO 08-16 18:16:00 async_llm_engine.py:670] Received request 8c5a8b9611a34bb2b4dfa010e1ea2cb1: prompt: 'A robot may not injure a human being', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.2, top_p=0.95, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
INFO 08-16 18:16:00 async_llm_engine.py:670] Received request 9e711ba716054ec6ad8b5cb077266b1a: prompt: 'To be or not to be, finish this poem', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.2, top_p=0.95, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
INFO 08-16 18:16:00 async_llm_engine.py:670] Received request 9e711ba716054ec6ad8b5cb077266b1a: prompt: 'To be or not to be, finish this poem', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.2, top_p=0.95, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
INFO 08-16 18:16:00 async_llm_engine.py:670] Received request 5e3e4c1e45574baf8b85d660e90bc996: prompt: 'What is the meaning of life?', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.2, top_p=0.95, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
INFO 08-16 18:16:00 async_llm_engine.py:670] Received request 5e3e4c1e45574baf8b85d660e90bc996: prompt: 'What is the meaning of life?', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.2, top_p=0.95, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: None, lora_request: None.
INFO 08-16 18:16:21 metrics.py:396] Avg prompt throughput: 2.7 tokens/s, Avg generation throughput: -1.1 tokens/s, Running: 6 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%.
ERROR 08-16 18:16:21 async_llm_engine.py:55] Engine background task failed
ERROR 08-16 18:16:21 async_llm_engine.py:55] Traceback (most recent call last):
ERROR 08-16 18:16:21 async_llm_engine.py:55]   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 45, in _log_task_completion
ERROR 08-16 18:16:21 async_llm_engine.py:55]     return_value = task.result()
ERROR 08-16 18:16:21 async_llm_engine.py:55]   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 618, in run_engine_loop
ERROR 08-16 18:16:21 async_llm_engine.py:55]     result = task.result()
ERROR 08-16 18:16:21 async_llm_engine.py:55]   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 561, in engine_step
ERROR 08-16 18:16:21 async_llm_engine.py:55]     request_outputs = await self.engine.step_async(virtual_engine)
ERROR 08-16 18:16:21 async_llm_engine.py:55]   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 253, in step_async
ERROR 08-16 18:16:21 async_llm_engine.py:55]     self.do_log_stats(scheduler_outputs, output)
ERROR 08-16 18:16:21 async_llm_engine.py:55]   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 938, in do_log_stats
ERROR 08-16 18:16:21 async_llm_engine.py:55]     logger.log(self._get_stats(scheduler_outputs, model_output))
ERROR 08-16 18:16:21 async_llm_engine.py:55]   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 530, in log
ERROR 08-16 18:16:21 async_llm_engine.py:55]     self._log_prometheus(stats)
ERROR 08-16 18:16:21 async_llm_engine.py:55]   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 487, in _log_prometheus
ERROR 08-16 18:16:21 async_llm_engine.py:55]     self._log_counter(self.metrics.counter_generation_tokens,
ERROR 08-16 18:16:21 async_llm_engine.py:55]   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 455, in _log_counter
ERROR 08-16 18:16:21 async_llm_engine.py:55]     counter.labels(**self.labels).inc(data)
ERROR 08-16 18:16:21 async_llm_engine.py:55]   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/prometheus_client/metrics.py", line 313, in inc
ERROR 08-16 18:16:21 async_llm_engine.py:55]     raise ValueError('Counters can only be incremented by non-negative amounts.')
ERROR 08-16 18:16:21 async_llm_engine.py:55] ValueError: Counters can only be incremented by non-negative amounts.
Exception in callback _log_task_completion(error_callback=<bound method...7fb4573ef6a0>>)(<Task finishe...ve amounts.')>) at /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py:35
handle: <Handle _log_task_completion(error_callback=<bound method...7fb4573ef6a0>>)(<Task finishe...ve amounts.')>) at /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py:35>
Traceback (most recent call last):
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 45, in _log_task_completion
    return_value = task.result()
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 618, in run_engine_loop
    result = task.result()
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 561, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 253, in step_async
    self.do_log_stats(scheduler_outputs, output)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 938, in do_log_stats
    logger.log(self._get_stats(scheduler_outputs, model_output))
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 530, in log
    self._log_prometheus(stats)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 487, in _log_prometheus
    self._log_counter(self.metrics.counter_generation_tokens,
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 455, in _log_counter
    counter.labels(**self.labels).inc(data)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/prometheus_client/metrics.py", line 313, in inc
    raise ValueError('Counters can only be incremented by non-negative amounts.')
ValueError: Counters can only be incremented by non-negative amounts.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 57, in _log_task_completion
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for theactual cause.
INFO 08-16 18:16:21 async_llm_engine.py:170] Aborted request 8c5a8b9611a34bb2b4dfa010e1ea2cb1.
INFO 08-16 18:16:21 async_llm_engine.py:170] Aborted request 9e711ba716054ec6ad8b5cb077266b1a.
INFO 08-16 18:16:21 async_llm_engine.py:170] Aborted request 5e3e4c1e45574baf8b85d660e90bc996.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/search-research/code/Users/grace.qin/search/src/humaneval/async_run.py", line 82, in <module>
[rank0]:     responses = asyncio.run(run_engine(llm_engine, sampling_params, prompts))
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/asyncio/runners.py", line 44, in run
[rank0]:     return loop.run_until_complete(main)
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
[rank0]:     return future.result()
[rank0]:   File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/search-research/code/Users/grace.qin/search/src/humaneval/async_run.py", line 73, in run_engine
[rank0]:     responses = await asyncio.gather(*(process_requests(llm_engine, sampling_params, prompt) for prompt in prompts))
[rank0]:   File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/search-research/code/Users/grace.qin/search/src/humaneval/async_run.py", line 59, in process_requests
[rank0]:     async for request_output in results_generator:
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 773, in generate
[rank0]:     async for output in self._process_request(
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 889, in _process_request
[rank0]:     raise e
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 885, in _process_request
[rank0]:     async for request_output in stream:
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 92, in __anext__
[rank0]:     raise result
[rank0]:   File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/search-research/code/Users/grace.qin/search/src/humaneval/async_run.py", line 59, in process_requests
[rank0]:     async for request_output in results_generator:
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 773, in generate
[rank0]:     async for output in self._process_request(
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 889, in _process_request
[rank0]:     raise e
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 885, in _process_request
[rank0]:     async for request_output in stream:
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 92, in __anext__
[rank0]:     raise result
[rank0]:   File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/search-research/code/Users/grace.qin/search/src/humaneval/async_run.py", line 59, in process_requests
[rank0]:     async for request_output in results_generator:
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 773, in generate
[rank0]:     async for output in self._process_request(
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 889, in _process_request
[rank0]:     raise e
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 885, in _process_request
[rank0]:     async for request_output in stream:
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 92, in __anext__
[rank0]:     raise result
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 45, in _log_task_completion
[rank0]:     return_value = task.result()
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 618, in run_engine_loop
[rank0]:     result = task.result()
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 561, in engine_step
[rank0]:     request_outputs = await self.engine.step_async(virtual_engine)
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 253, in step_async
[rank0]:     self.do_log_stats(scheduler_outputs, output)
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 938, in do_log_stats
[rank0]:     logger.log(self._get_stats(scheduler_outputs, model_output))
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 530, in log
[rank0]:     self._log_prometheus(stats)
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 487, in _log_prometheus
[rank0]:     self._log_counter(self.metrics.counter_generation_tokens,
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/vllm-0.5.2+cpu-py3.10-linux-x86_64.egg/vllm/engine/metrics.py", line 455, in _log_counter
[rank0]:     counter.labels(**self.labels).inc(data)
[rank0]:   File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/prometheus_client/metrics.py", line 313, in inc
[rank0]:     raise ValueError('Counters can only be incremented by non-negative amounts.')
[rank0]: ValueError: Counters can only be incremented by non-negative amounts.