xusenlinzy / api-for-open-llm

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口
Apache License 2.0
2.16k stars 252 forks source link

Qwen1.5 提示模板缺少默认 system message #240

Closed liuyanyi closed 4 months ago

liuyanyi commented 4 months ago

提交前必须检查以下项目 | The following items must be checked before submission

问题类型 | Type of problem

其他问题 | Other issues

操作系统 | Operating system

None

详细描述问题 | Detailed description of the problem

使用 qwen tokenizer 的 chat_template 和 api-for-open-llm 中的 template, 如果messages中没有system信息,前面会缺一个system提示

if __name__ == '__main__':
    tokenizer = AutoTokenizer.from_pretrained("/large-storage/model/Qwen1.5/Qwen1.5-72B-Chat/")

    chat = [
        {"role": "user", "content": "Hello, how are you?"}
    ]

    template = get_prompt_adapter(prompt_name="qwen2")
    messages = template.postprocess_messages(chat)

    text_ori = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    text_api = template.apply_chat_template(messages, add_generation_prompt=True)

    print("========== text_ori ==========")
    print(text_ori)

    print("========== text_api ==========")
    print(text_api)

输出:

========== text_ori ==========
<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant

========== text_api ==========
<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant

参考原始 qwen 的分词器模板,里面有这样一段:

{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant<|im_end|>\n' }}{% endif %}

Dependencies

# 请在此处粘贴依赖情况
# Please paste the dependencies here

运行日志或截图 | Runtime logs or screenshots

# 请在此处粘贴运行日志
# Please paste the run log here
xusenlinzy commented 4 months ago

已经更新了