Closed Semihal closed 5 months ago
@EthanqX can you take a look?
How many generations are you running per request (e.g. what is the n
parameter set to)? If you are requesting more than 1, we currently count each n
as a separate request)
How many generations are you running per request (e.g. what is the
n
parameter set to)? If you are requesting more than 1, we currently count eachn
as a separate request)
I check log, and see this message:
Received request
: prompt: '<|begin▁of▁sentence|>You are a helpful assistant. You can help me by answering my questions. You can also ask me questions.### Instruction:\nwhat llm are you\n### Response:\n', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=12249, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [32013, 32013, 2042, 417, 245, 9396, 20391, 13, 1255, 482, 1341, 523, 457, 25923, 597, 4301, 13, 1255, 482, 835, 2076, 523, 4301, 13, 13518, 3649, 3475, 25, 185, 5003, 1703, 76, 417, 340, 185, 13518, 21289, 25, 185], lora_request: None.
And `vllm:request_success_total' increased by 1. But... Graphana is again 2. Could there be a problem with the Grafana request?
Perhaps this post is the answer: https://stackoverflow.com/a/49653270
Anything you want to discuss about vllm.
I am using the following script to display the vllm metric:request_success_total:
sum(increase(vllm:request_success_total{model_name="$MODEL_NAME"}[$__rate_interval])) by (finished_reason)
But each of my queries in the model is displayed on the graph in the amount of "2". It seems that the value is incremented twice by mistake with a single request.