vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.19k stars 4.56k forks source link

[Bug]: Get meaningless output when run long context inference of Qwen2.5 model with vllm>=0.6.3 #10298

Open piamo opened 1 day ago

piamo commented 1 day ago

Your current environment

The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ```

Model Input Dumps

models: Qwen2.5-Coder-7B-Instcut, Qwen2.5-7B-Instruct vllm: 0.6.3 input token: >8000 tokens

🐛 Describe the bug

I have tested vllm 0.6.0~0.6.2, 0.5.5, all old versions are just ok.

So this bug was introduced since 0.6.3

Before submitting a new issue...

CHNtentes commented 1 day ago

same here, used qwen2.5-72b-instruct-awq and 10000 tokens input, the output is garbage

CHNtentes commented 1 day ago

downgraded to vllm 0.6.2 and it's much better