vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.93k stars 4.7k forks source link

[Bug]: The Qwen series models produce garbled output when generating long texts. #9825

Open hongqing1986 opened 4 weeks ago

hongqing1986 commented 4 weeks ago

Your current environment

vLLM version: v0.6.3.post1

🐛 Describe the bug

In the latest version v0.6.3.post1, when generating long texts (for example, when the number of tokens reaches 21,000), the generated content is basically garbled. Additionally, after verifying, the long text functionality in v0.6.2 works correctly using the qwen2-7b-instruct model. Furthermore, I also tested other models like qwen2.5-72b-instruct, which exhibit the same problem.

Before submitting a new issue...

jeejeelee commented 4 weeks ago

Perhaps similar to https://github.com/vllm-project/vllm/issues/9769

frei-x commented 3 weeks ago

+1

DarkLight1337 commented 2 weeks ago

Can you try again using the latest version? Supposedly it should be fixed since #9826.

frei-x commented 2 weeks ago

Can you try again using the latest version? Supposedly it should be fixed since #9826.

When will 0.6.4 be released?

DarkLight1337 commented 2 weeks ago

A release is quite overdue, we're planning to release the next update this week.