Closed bytes-lost closed 1 year ago
Same Error
gpu: 8*A100 40G
pytorch: 2.0.0
cuda version: 11.7
deepspeed: 0.9.0+0b5252b
transformers: 4.28.0.dev
deepspeed main.py \
--data_path BelleGroup/train_1M_CN \
--model_name_or_path gpt-neox-20b/ \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--max_seq_len 512 \
--learning_rate 9.65e-5 \
--weight_decay 0.1 \
--num_train_epochs 2 \
--gradient_accumulation_steps 1 \
--lr_scheduler_type cosine \
--num_warmup_steps 0 \
--seed 1234 \
--lora_dim 128 \
--only_optimize_lora \
--zero_stage 3 \
--deepspeed \
--output_dir $OUTPUT_PATH
同样的错误
I got the same error
The script worked for opt models but do not work for other models. I guess it has something to do with the model format.
I found the solution. Basically you have to change --lora_module_name decoder.layers.
to the appropriate name for you model, for example, --lora_module_name h.
for bloom and gpt-neo.
I found the solution. Basically you have to change
--lora_module_name decoder.layers.
to the appropriate name for you model, for example,--lora_module_name h.
for bloom and gpt-neo.
Thanks for your suggestion! Do you know what is the lora_module_name for llama model?
Thank you @puyuanOT :). Yes, the LoRA replacement is based on the model arch (or the name)
I found the solution. Basically you have to change
--lora_module_name decoder.layers.
to the appropriate name for you model, for example,--lora_module_name h.
for bloom and gpt-neo.Thanks for your suggestion! Do you know what is the lora_module_name for llama model?
你可以执行这行代码 from transformers import AutoModel model = AutoModel.from_pretrained("llama-7b-zpn") for name, module in model.named_modules(): print(name) 可以看到 layer.开头,--lora_module_name layers.这样写就可以了
I found the solution. Basically you have to change
--lora_module_name decoder.layers.
to the appropriate name for you model, for example,--lora_module_name h.
for bloom and gpt-neo.Thanks for your suggestion! Do you know what is the lora_module_name for llama model?
你可以执行这行代码 from transformers import AutoModel model = AutoModel.from_pretrained("llama-7b-zpn") for name, module in model.named_modules(): print(name) 可以看到 layer.开头,--lora_module_name layers.这样写就可以了
Yup, 我是这么做的。现在已经可以跑通了。感谢哟 :D
nice
env
run script
error message