Open wac81 opened 1 year ago
I met the same error. Is there any solution to this?
same error
Same error too. I've tried it in DeepSpeed Chat. But it's fine if I don't use only_optimize_lora param.
Different model have different module_name. Your finetune model is different from the opt model. If you used the argument only_optimize_lora, you should set the argument lora_module_name to mark your optimized lord layer. You can load your own model and use python code "print(list(model.named_modules()))" to see your model modules. You can learn it from lora source code of deepspeed-chat. Lora source code path is "training/utils/module/lora.py". Lord method usually use the linear layer to add additional lora net, you can see the official source code of lora paper. Official lora source code link : https://github.com/microsoft/LoRA/blob/main/loralib/layers.py. Hope it can help you.
here are my steps to add lora to deepspeed model: 1) copy lora.py to your own codebase: lora.py
2) add LoRA to your own model:
model = BigModel() # define your own model
import lora # import your copied lora.py
lora_dim=6
lora_module_name = "xxx" # use the module name of your own model (e.g.: "model.layers.30.self_attn.q_proj")
only_optimize_lora = True
if lora_dim > 0:
model = convert_linear_layer_to_lora(model, lora_module_name, lora_dim)
if only_optimize_lora:
model = only_optimize_lora_parameters(model)
model.enable_input_require_grads() # this is added to avoid the error: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn (ref: https://github.com/huggingface/peft/issues/137#issuecomment-1445912413)
# The following are normal operations just to show you the position of the above code:
optimizer_grouped_parameters = get_optimizer_grouped_parameters(xxx)
optimizer = xxx
lr_scheduler = xxx
model, _ = deepspeed.initialize(model, xxx) # deepspeed engine is defined here!
train(model)
Hope this helps :)
use lora param in stage1 here is context:
add about lora --lora_dim 8 --only_optimize_lora \
get error: