vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.54k stars 4.44k forks source link

[Bug]: Repeatedly printing after the conversation ends<| im_end |><| im_start |> #4251

Open huangshengfu opened 6 months ago

huangshengfu commented 6 months ago

Your current environment

 docker run --rm --runtime nvidia --gpus all  --name vllm-qwen72b     -v  ~/.cache/huggingface:/root/.cache/huggingface    \
   -v /data1/Download/models/Qwen-72B-Chat-Int4:/data/shared/Qwen/Qwen-Chat     -p 8901:8000     --ipc=host  \
   vllm/vllm-openai:latest --model /data/shared/Qwen/Qwen-Chat     --max-model-len 6400  --trust-remote-code  --tensor-parallel-size 2  \
   --gpu-memory-utilization 0.9  --served-model-name qwen72b --api-key "xxxx"  

🐛 Describe the bug

I encountered an issue while running the model in Docker environment. The model is Qwen-72B and the conversation cannot end properly image

lijiajun1997 commented 6 months ago

Same problem when using vllm+chatglm3+oneapi+fastgpt. Not sure what part goes wrong

huangshengfu commented 6 months ago

应该是vllm的问题,目前还没找到解决办法,有办法了麻烦踢我一下

huangdehong commented 6 months ago

me too,看到有个类似的解决办法,但不知道要再vllm中怎么修改:https://zhuanlan.zhihu.com/p/695477673

huangshengfu commented 6 months ago

我的问题解决了,我是用的oneapi接入了fastgpt,然后我在fastgpt的配置文件中加上了结束的参数 | im_end |就好了

lijiajun1997 commented 6 months ago

我的问题解决了,我是用的oneapi接入了fastgpt,然后我在fastgpt的配置文件中加上了结束的参数 | im_end |就好了

求分享

huangshengfu commented 6 months ago

"defaultConfig":{"stop": "<|im_end|>"}

QuanhuiGuan commented 6 months ago

我是在请求的时候加上停止符的tokenId解决的

github-actions[bot] commented 5 days ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!