Open gaijigoumeiren opened 8 months ago
我也遇到了相同问题,0.3.1和0.3.2版本都有上述问题,在llama模型和qwen1,.5上都遇到了
Qwen1.5-7B已解决 max_tokens默认为16,指定SamplingParams时传入一个大的max_tokens参数 output = llm.generate(text, sampling_params=SamplingParams(max_tokens=512))
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
我在部署qwen1.5-7B-Chat的时候遇到调用API时最后有10个字符缺失的问题,长度正好是结束token<|im_end|>。
临时的解决方案:_调用接口的时候传入:include_stop_str_inoutput=True 可能是因为在调用api的时候include_stop_str_in_output默认是False,而在 https://github.com/vllm-project/vllm/blob/main/vllm/engine/llm_engine.py#L966中,最后的stop token会被截断掉,但是seq.output_text中并不包含<|im_end|>,所以就截断错了。
感觉是不是改成如下就OK了