hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
33.51k stars 4.11k forks source link

chatglm3 prompt模板中间有空格是否正确 #3095

Closed luoqishuai closed 7 months ago

luoqishuai commented 7 months ago

Reminder

Reproduction

from llmtuner.data import template from transformers import AutoTokenizer model_path = "/home/dp-ai-server/backupdata/NLP/qishuai.luo/pretrain_model/chatglm3-6b-32k" model_path = "/home/dp-ai-server/backupdata/NLP/qishuai.luo/pretrain_model/chatglm3-6b" tokenizer=AutoTokenizer.from_pretrained(model_path,trust_remote_code=True) t=template.get_template_and_fix_tokenizer(name="chatglm3_system",tokenizer=tokenizer) query="您好" response="" messages=[{"role": "user", "content":query}]+[{"role": "assistant", "content": response}] tokenizer.decode(t.encode_oneturn(tokenizer,messages=messages)[0])

Expected behavior

返回 [gMASK]sop<|system|> \n You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.<|user|> \n 您好<|assistant|>

<|system|> \n You中间有2个空格,<|system|> \n You中间也有2个空格. 请问大佬,我通过这样的代码获取模型对应的输入是否正确?chatglm3出现的空格是否正确?(我看https://github.com/vllm-project/vllm/pull/1261给的示例输入中没有空格)

System Info

transformers=4.38.2 LLaMA-Factory代码版本20240313

Others

No response

hiyouga commented 7 months ago

不影响