[Bug]: The throughput computation in metric.py seems wrong

Your current environment

The output of `python collect_env.py`

```text Your output of `python collect_env.py` here ```

Model Input Dumps

No response

🐛 Describe the bug

Seems that prefill throughput and decode throughput are both divided by the overall time, i.e., $$\text{prefill throughput} = {\text{num of input tokens}\over\text{prefill time + decode time}}$$ $$\text{decode throughput} = {\text{num of output tokens}\over\text{prefill time + decode time}}$$. but should be $$\text{prefill throughput} = {\text{num of input tokens}\over\text{prefill time}}$$ $$\text{decode throughput} = {\text{num of output tokens}\over\text{decode time}}$$.

This will significantly affect the model performance data in scenarios with long inputs and long outputs.

See https://github.com/vllm-project/vllm/blob/main/vllm/engine/metrics.py#L440C1-L446C46 and https://github.com/vllm-project/vllm/blob/main/vllm/engine/metrics.py#L416C1-L418C59

prompt_throughput = get_throughput(self.num_prompt_tokens,
                                               now=stats.now,
                                               last_log=self.last_local_log)
generation_throughput = get_throughput(
                self.num_generation_tokens,
                now=stats.now,
                last_log=self.last_local_log)

def get_throughput(tracked_stats: List[int], now: float,
                   last_log: float) -> float:
    return float(np.sum(tracked_stats) / (now - last_log))

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

vllm-project / vllm