Closed luoqishuai closed 7 months ago
from llmtuner.data import template from transformers import AutoTokenizer model_path = "/home/dp-ai-server/backupdata/NLP/qishuai.luo/pretrain_model/chatglm3-6b-32k" model_path = "/home/dp-ai-server/backupdata/NLP/qishuai.luo/pretrain_model/chatglm3-6b" tokenizer=AutoTokenizer.from_pretrained(model_path,trust_remote_code=True) t=template.get_template_and_fix_tokenizer(name="chatglm3_system",tokenizer=tokenizer) query="您好" response="" messages=[{"role": "user", "content":query}]+[{"role": "assistant", "content": response}] tokenizer.decode(t.encode_oneturn(tokenizer,messages=messages)[0])
返回 [gMASK]sop<|system|> \n You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.<|user|> \n 您好<|assistant|>
<|system|> \n You中间有2个空格,<|system|> \n You中间也有2个空格. 请问大佬,我通过这样的代码获取模型对应的输入是否正确?chatglm3出现的空格是否正确?(我看https://github.com/vllm-project/vllm/pull/1261给的示例输入中没有空格)
transformers=4.38.2 LLaMA-Factory代码版本20240313
No response
不影响
Reminder
Reproduction
from llmtuner.data import template from transformers import AutoTokenizer model_path = "/home/dp-ai-server/backupdata/NLP/qishuai.luo/pretrain_model/chatglm3-6b-32k" model_path = "/home/dp-ai-server/backupdata/NLP/qishuai.luo/pretrain_model/chatglm3-6b" tokenizer=AutoTokenizer.from_pretrained(model_path,trust_remote_code=True) t=template.get_template_and_fix_tokenizer(name="chatglm3_system",tokenizer=tokenizer) query="您好" response="" messages=[{"role": "user", "content":query}]+[{"role": "assistant", "content": response}] tokenizer.decode(t.encode_oneturn(tokenizer,messages=messages)[0])
Expected behavior
返回 [gMASK]sop<|system|> \n You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.<|user|> \n 您好<|assistant|>
<|system|> \n You中间有2个空格,<|system|> \n You中间也有2个空格. 请问大佬,我通过这样的代码获取模型对应的输入是否正确?chatglm3出现的空格是否正确?(我看https://github.com/vllm-project/vllm/pull/1261给的示例输入中没有空格)
System Info
transformers=4.38.2 LLaMA-Factory代码版本20240313
Others
No response