Change Llama tokenizer from LlamaTokenizer to AutoTokenizer

shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

Apache License 2.0

2.94k stars 451 forks source link

Closed princepride closed 1 month ago

princepride commented 1 month ago

When we using LlamaTokenizer to load the Llama3 tokenizer file, it will throw out an error. So refer to: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/discussions/37

we should change LlamaTokenizer to AutoTokenizer

shibing624 commented 1 month ago

那用auto不就好了？