ymcui / Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Apache License 2.0
7.04k stars 581 forks source link

Added control to tokenizer for pad_token #418

Open yusufcakmakk opened 9 months ago

yusufcakmakk commented 9 months ago

I realized that there is no control for padding token when using sft trainer. So we can control it for custom tokenizers.

iMountTai commented 9 months ago

我们的tokenizer中是含有pad_token的,这个PR暂时不考虑合并,谢谢您的贡献。