ymcui / Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki
Apache License 2.0
18.23k stars 1.86k forks source link

ceval中“A”encode后的sA_id与“:A”encode后的A_id有何不同呢?各代表什么含义。Debug后id确实不一样 #809

Closed lizhzh8 closed 1 year ago

lizhzh8 commented 1 year ago

提交前必须检查以下项目

问题类型

其他问题

基础模型

LLaMA-Plus-13B

操作系统

Linux

详细描述问题

ceval中“A”encode后的sA_id与“:A”encode后的A_id有何不同呢?各代表什么含义

        self.sA_id = self.tokenizer.encode("A", add_special_tokens=False)[0]
        self.sB_id = self.tokenizer.encode("B", add_special_tokens=False)[0]
        self.sC_id = self.tokenizer.encode("C", add_special_tokens=False)[0]
        self.sD_id = self.tokenizer.encode("D", add_special_tokens=False)[0]
        self.A_id = self.tokenizer.encode(":A")[-1]
        self.B_id = self.tokenizer.encode(":B")[-1]
        self.C_id = self.tokenizer.encode(":C")[-1]
        self.D_id = self.tokenizer.encode(":D")[-1]

依赖情况(代码类问题务必提供)

# 请在此处粘贴依赖情况

运行日志或截图

# 请在此处粘贴运行日志
airaria commented 1 year ago

A_idsA_id分别表示token "A""▁A"的id。BCD类似。

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] commented 1 year ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.