can't load tokenizer - Githubissues

Guo-Chenxu commented 1 year ago

i run the code with the following instruction:

python finetune.py \
    --base_model='/home/guochenxu/pythonProjects/alpaca-lora/alpaca-lora-7b' \
    --num_epochs=10 \
    --cutoff_len=512 \
    --group_by_length \
    --output_dir='./lora-alpaca-512-qkvo' \
    --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \
    --lora_r=16 \
    --micro_batch_size=8

i download the model files from https://huggingface.co/tloen/alpaca-lora-7b, and my directory is like this:

but i get the error as follows:

it seems i don't have the tokenizer files, so how can i get those, or can i solve this problem with other method?

i'm a beginner, so maybe this problem seems to be a little stupid, but i have tried searching the web, finally my problem is still existing. i would be appreciate, if anyone can answer me.

hychaochao commented 1 year ago

直接用中文回你啦，不知你解决了没有。我也是刚入门没多久，看你的代码是想在alpaca-lora的基础上再微调？可以这样：

python finetune_copy.py \
    --base_model 'llama1' \
    --data_path ‘XXX.json' \
    --output_dir './lora-alpaca' \
    --resume_from_checkpoint 'tloen/alpaca-lora'

Guo-Chenxu commented 1 year ago

直接用中文回你啦，不知你解决了没有。我也是刚入门没多久，看你的代码是想在alpaca-lora的基础上再微调？可以这样：
python finetune_copy.py \
    --base_model 'llama1' \
    --data_path ‘XXX.json' \
    --output_dir './lora-alpaca' \
    --resume_from_checkpoint 'tloen/alpaca-lora'

感谢您的回答, 我出问题时因为是下错模型了, 应该用alpaca-7b (不得不说确实挺stupid😂

tloen / alpaca-lora

can't load tokenizer #603