Added control to tokenizer for pad_token

ymcui / Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

Apache License 2.0

7.04k stars 581 forks source link

Open yusufcakmakk opened 9 months ago

yusufcakmakk commented 9 months ago

I realized that there is no control for padding token when using sft trainer. So we can control it for custom tokenizers.

iMountTai commented 9 months ago

我们的tokenizer中是含有pad_token的，这个PR暂时不考虑合并，谢谢您的贡献。