[Bug]: The Qwen series models produce garbled output when generating long texts.

hongqing1986 commented 4 weeks ago

Your current environment

vLLM version: v0.6.3.post1

🐛 Describe the bug

In the latest version v0.6.3.post1, when generating long texts (for example, when the number of tokens reaches 21,000), the generated content is basically garbled. Additionally, after verifying, the long text functionality in v0.6.2 works correctly using the qwen2-7b-instruct model. Furthermore, I also tested other models like qwen2.5-72b-instruct, which exhibit the same problem.

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

jeejeelee commented 4 weeks ago

frei-x commented 3 weeks ago

+1

DarkLight1337 commented 2 weeks ago

Can you try again using the latest version? Supposedly it should be fixed since #9826.

frei-x commented 2 weeks ago

Can you try again using the latest version? Supposedly it should be fixed since #9826.

When will 0.6.4 be released?

DarkLight1337 commented 2 weeks ago

A release is quite overdue, we're planning to release the next update this week.

vllm-project / vllm