[Bug]: Get meaningless output when run long context inference of Qwen2.5 model with vllm>=0.6.3

piamo commented 1 day ago

The output of `python collect_env.py`

```text Your output of `python collect_env.py` here ```

models: Qwen2.5-Coder-7B-Instcut, Qwen2.5-7B-Instruct vllm: 0.6.3 input token: >8000 tokens

I have tested vllm 0.6.0~0.6.2, 0.5.5, all old versions are just ok.

So this bug was introduced since 0.6.3

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

CHNtentes commented 1 day ago

same here, used qwen2.5-72b-instruct-awq and 10000 tokens input, the output is garbage

CHNtentes commented 1 day ago

downgraded to vllm 0.6.2 and it's much better

vllm-project / vllm