microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
34.8k stars 4.05k forks source link

lora error in stage 1 #3339

Open wac81 opened 1 year ago

wac81 commented 1 year ago

use lora param in stage1 here is context:

deepspeed main.py \
   --lora_dim 8 --only_optimize_lora \
   --data_path wangrui6/Zhihu-KOL Cohere/miracl-zh-queries-22-12 Hello-SimpleAI/HC3-Chinese mkqa-Chinese \
   --data_split 10,0,0 \
   --model_name_or_path $MODEL \
   --per_device_train_batch_size 2 \
   --per_device_eval_batch_size 2 \
   --learning_rate 9.65e-6 \
   --num_train_epochs 16  \
   --deepspeed --seed 1234 --num_warmup_steps 0 \
   --lr_scheduler_type cosine \
   --output_dir $OUTPUT_PATH \
   &> $OUTPUT_PATH/training.log

add about lora --lora_dim 8 --only_optimize_lora \

get error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
zhan0903 commented 1 year ago

I met the same error. Is there any solution to this?

zhujc000 commented 1 year ago

same error

kisseternity commented 1 year ago

Same error too. I've tried it in DeepSpeed Chat. But it's fine if I don't use only_optimize_lora param.

li995495592 commented 1 year ago

Different model have different module_name. Your finetune model is different from the opt model. If you used the argument only_optimize_lora, you should set the argument lora_module_name to mark your optimized lord layer. You can load your own model and use python code "print(list(model.named_modules()))" to see your model modules. You can learn it from lora source code of deepspeed-chat. Lora source code path is "training/utils/module/lora.py". Lord method usually use the linear layer to add additional lora net, you can see the official source code of lora paper. Official lora source code link : https://github.com/microsoft/LoRA/blob/main/loralib/layers.py. Hope it can help you.

bing0037 commented 1 year ago

here are my steps to add lora to deepspeed model: 1) copy lora.py to your own codebase: lora.py

2) add LoRA to your own model:

model = BigModel() # define your own model

import lora # import your copied lora.py
lora_dim=6
lora_module_name = "xxx" # use the module name of your own model (e.g.: "model.layers.30.self_attn.q_proj")
only_optimize_lora = True

if lora_dim > 0:
    model = convert_linear_layer_to_lora(model, lora_module_name, lora_dim)
    if only_optimize_lora:
        model = only_optimize_lora_parameters(model)
        model.enable_input_require_grads() # this is added to avoid the error: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn (ref: https://github.com/huggingface/peft/issues/137#issuecomment-1445912413)

# The following are normal operations just to show you the position of the above code:
optimizer_grouped_parameters = get_optimizer_grouped_parameters(xxx)
optimizer = xxx
lr_scheduler = xxx
model, _ = deepspeed.initialize(model, xxx) # deepspeed engine is defined here!
train(model)

Hope this helps :)