The output of `python collect_env.py`
```text
Your output of `python collect_env.py` here
```
Model Input Dumps
No response
🐛 Describe the bug
Seems that prefill throughput and decode throughput are both divided by the overall time, i.e.,
$$\text{prefill throughput} = {\text{num of input tokens}\over\text{prefill time + decode time}}$$
$$\text{decode throughput} = {\text{num of output tokens}\over\text{prefill time + decode time}}$$.
but should be
$$\text{prefill throughput} = {\text{num of input tokens}\over\text{prefill time}}$$
$$\text{decode throughput} = {\text{num of output tokens}\over\text{decode time}}$$.
This will significantly affect the model performance data in scenarios with long inputs and long outputs.
[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Your current environment
The output of `python collect_env.py`
```text Your output of `python collect_env.py` here ```Model Input Dumps
No response
🐛 Describe the bug
Seems that prefill throughput and decode throughput are both divided by the overall time, i.e., $$\text{prefill throughput} = {\text{num of input tokens}\over\text{prefill time + decode time}}$$ $$\text{decode throughput} = {\text{num of output tokens}\over\text{prefill time + decode time}}$$. but should be $$\text{prefill throughput} = {\text{num of input tokens}\over\text{prefill time}}$$ $$\text{decode throughput} = {\text{num of output tokens}\over\text{decode time}}$$.
This will significantly affect the model performance data in scenarios with long inputs and long outputs.
See https://github.com/vllm-project/vllm/blob/main/vllm/engine/metrics.py#L440C1-L446C46 and https://github.com/vllm-project/vllm/blob/main/vllm/engine/metrics.py#L416C1-L418C59
Before submitting a new issue...