vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.24k stars 4.58k forks source link

[Misc]: Why prometheus metric vllm:request_success_total doubles the value? #5250

Closed Semihal closed 5 months ago

Semihal commented 5 months ago

Anything you want to discuss about vllm.

I am using the following script to display the vllm metric:request_success_total: sum(increase(vllm:request_success_total{model_name="$MODEL_NAME"}[$__rate_interval])) by (finished_reason)

But each of my queries in the model is displayed on the graph in the amount of "2". It seems that the value is incremented twice by mistake with a single request.

image

simon-mo commented 5 months ago

@EthanqX can you take a look?

robertgshaw2-neuralmagic commented 5 months ago

How many generations are you running per request (e.g. what is the n parameter set to)? If you are requesting more than 1, we currently count each n as a separate request)

Semihal commented 5 months ago

How many generations are you running per request (e.g. what is the n parameter set to)? If you are requesting more than 1, we currently count each n as a separate request)

I check log, and see this message:

Received request : prompt: '<|begin▁of▁sentence|>You are a helpful assistant. You can help me by answering my questions. You can also ask me questions.### Instruction:\nwhat llm are you\n### Response:\n', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=12249, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [32013, 32013, 2042, 417, 245, 9396, 20391, 13, 1255, 482, 1341, 523, 457, 25923, 597, 4301, 13, 1255, 482, 835, 2076, 523, 4301, 13, 13518, 3649, 3475, 25, 185, 5003, 1703, 76, 417, 340, 185, 13518, 21289, 25, 185], lora_request: None.

And `vllm:request_success_total' increased by 1. But... Graphana is again 2. Could there be a problem with the Grafana request?

Semihal commented 5 months ago

Perhaps this post is the answer: https://stackoverflow.com/a/49653270