vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.89k stars 4.11k forks source link

[Bug]: The tail problem #5123

Open ZixinxinWang opened 4 months ago

ZixinxinWang commented 4 months ago

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

I wonder if I'm the only one to meet the problem that each time the generation process is closing up, it just sticks there for a long time. So the main cost of time is at the very beginning and the closing procedure. I wonder why and how can i fix it.

mgoin commented 4 months ago

@ZixinxinWang please offer a specific example with code and timing on your system (with the output of python collect_env.py) so we can reproduce to help you diagnose the issue.