jianzhnie / LLamaTuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.
https://jianzhnie.github.io/llmtech/
Apache License 2.0
561 stars 61 forks source link

Tokenizer class BaiChuanTokenizer does not exist or is not currently imported. #11

Open corlin opened 1 year ago

corlin commented 1 year ago

错误信息如下

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /Users/corlin/code/Efficient-Tuning-LLMs/qlora_finetune.py:396 in │ │ │ │ 393 │ │ 394 │ │ 395 if name == 'main': │ │ ❱ 396 │ main() │ │ 397 │ │ │ │ /Users/corlin/code/Efficient-Tuning-LLMs/qlora_finetune.py:312 in main │ │ │ │ 309 │ set_seed(args.seed) │ │ 310 │ │ │ 311 │ # Tokenizer │ │ ❱ 312 │ tokenizer = AutoTokenizer.from_pretrained( │ │ 313 │ │ args.model_name_or_path, │ │ 314 │ │ cache_dir=args.cache_dir, │ │ 315 │ │ padding_side='right', │ │ │ │ /Users/corlin/code/transformers/src/transformers/models/auto/tokenization_auto.py:688 in │ │ from_pretrained │ │ │ │ 685 │ │ │ │ tokenizer_class_candidate = config_tokenizer_class │ │ 686 │ │ │ │ tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate) │ │ 687 │ │ │ if tokenizer_class is None: │ │ ❱ 688 │ │ │ │ raise ValueError( │ │ 689 │ │ │ │ │ f"Tokenizer class {tokenizer_class_candidate} does not exist or is n │ │ 690 │ │ │ │ ) │ │ 691 │ │ │ return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *input │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ValueError: Tokenizer class BaiChuanTokenizer does not exist or is not currently imported.

corlin commented 1 year ago

macos M1环境

jianzhnie commented 1 year ago

You should download the BaiChuanTokenizer and BaiChuan Model Checkpont from the https://huggingface.co/baichuan-inc/baichuan-7B first

jianzhnie commented 1 year ago

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/baichuan-7B", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("baichuan-inc/baichuan-7B", device_map="auto", trust_remote_code=True) inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt') inputs = inputs.to('cuda:0') pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1) print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

corlin commented 1 year ago

You should download the BaiChuanTokenizer and BaiChuan Model Checkpont from the https://huggingface.co/baichuan-inc/baichuan-7B first image 相关模型目录文件是全的啊。

jianzhnie commented 1 year ago

Run folowing example to test the model and tokenizer is well loaded and well inference

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("your_download_model_path", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("your_download_model_path", device_map="auto", trust_remote_code=True)
inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
RIU-13 commented 1 year ago

我没加trust_remote_code会报错,加了就好了