部署qwen1.5-7B-Chat的时候遇到API接口返回缺10个字符的问题

gaijigoumeiren commented 8 months ago

我在部署qwen1.5-7B-Chat的时候遇到调用API时最后有10个字符缺失的问题，长度正好是结束token<|im_end|>。

nohup python -m vllm.entrypoints.openai.api_server \
    --model /Qwen/Qwen1.5-7B-Chat
    --host 0.0.0.0 \
    --port 80 \
    --trust-remote-code \

临时的解决方案：_调用接口的时候传入：include_stop_str_inoutput=True 可能是因为在调用api的时候include_stop_str_in_output默认是False，而在 https://github.com/vllm-project/vllm/blob/main/vllm/engine/llm_engine.py#L966中，最后的stop token会被截断掉，但是seq.output_text中并不包含<|im_end|>，所以就截断错了。

    def _finalize_sequence(self, seq: Sequence,
                           sampling_params: SamplingParams,
                           stop_string: str) -> None:
        if not sampling_params.include_stop_str_in_output and stop_string:
            # Truncate the output text so that the stop string is
            # not included in the output.
            seq.output_text = seq.output_text[:-len(stop_string)]

感觉是不是改成如下就OK了

def _finalize_sequence(self, seq: Sequence,
                           sampling_params: SamplingParams,
                           stop_string: str) -> None:
        if not sampling_params.include_stop_str_in_output and stop_string:
            # Truncate the output text so that the stop string is
            # not included in the output.
            seq.output_text = seq.output_text.rstrip(stop_string)

lcvcl commented 8 months ago

我也遇到了相同问题，0.3.1和0.3.2版本都有上述问题，在llama模型和qwen1,.5上都遇到了

currenttime commented 8 months ago

Qwen1.5-7B已解决 max_tokens默认为16，指定SamplingParams时传入一个大的max_tokens参数 output = llm.generate(text, sampling_params=SamplingParams(max_tokens=512))

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

vllm-project / vllm

部署qwen1.5-7B-Chat的时候遇到API接口返回缺10个字符的问题 #3034