Open li995495592 opened 5 months ago
在对话最开始添加上系统提示词试下 {"role": "system", "content": "..."}
已经加了
According to Qwen people, please try the v0.3.2 version
According to Qwen people, please try the v0.3.2 version
Vllm v0.3.2 version also have the same problem.
I am facing the same issue with this model.
I also have this problem with qwen1.5-0.5B-chat model which is supervised fine-tuned on transformers 4.38.2.
@li995495592 Could you try vllm=0.4.0.post1?
@li995495592 Could you try vllm=0.4.0.post1?
@esmeetu vllm 0.3.2, 0.3.3, 0.4.0, 0.4.0.post1 all have this problem when serving qwen-1.5-14b-gptq-int4
@li995495592 Could you try vllm=0.4.0.post1?
@esmeetu vllm 0.3.2, 0.3.3, 0.4.0, 0.4.0.post1 all have this problem when serving qwen-1.5-14b-gptq-int4
Hi Huarong, I compiled the latest version (0.4.0.post1) of the vllm locally and successfully ran both the offline inference demo and the openai style API server inference. Here is the screenshot:
Thanks for trying. @xin-li-67 The !!!! occurred from time to time depending on the prompts. Can you try with more than 100 samples?
More details:
We can get right result from bf16 model. But when inferencing our trained qwen1.5-14b-gptq-int4 model, nan may occur in the occasion of the prompts where the output probability is very high. The output are a lot of !!!!!. !!! mainly follows digits like 1 or 2.
auto-gptq may not be the problem for the result is ok if we inference with transformers instead of vllm.
vllm serving with awq int4 is ok.
Versions:
Is there any progress ?
How to solve it, my qwen1.5-0.5b trained on hh dataset, almost can not generate nomal response using vllm. However, i can get normal response in hf
met the same problem
使用中文提示词试下?
Same problem here with glm-4-9b-chat
model
update: reduce --max-model-len
from 8000->6144 and reduce --gpu-memory-utilization
from 0.95 -> 0.9 fix the problem
Is there any progress ?
同样的问题,有没有大佬解释一下,下面几张图片分别尝试了英文的提示词和中文的提示词,中文的正常
I also have this problem with qwen1.5-0.5B-chat model which is supervised fine-tuned on transformers 4.38.2.
have you solved this problem? I also very disturbed by this, and I tried not sfted qwen1.5-0.5B-chat model, it can work well, but my sfted qwen1.5-0.5B-chat model can not work (inference with vLLM. btw, inference with official script can work well), inference result totally repeat, and my vLLM version is 0.3.0. I am not sure how to solve this problem. If you solved, please share, thx! My repeat output like:
Your current environment
Qwen1.5-14B-Chat使用vllm==0.3.3版本在Tesla V100-PCIE-32GB显卡上部署结果全部是感叹号,无结果
🐛 Describe the bug
Qwen1.5-14B-Chat使用vllm==0.3.3版本在Tesla V100-PCIE-32GB显卡上部署结果全部是感叹号,无结果,部署环境未报错