microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
6.09k stars 1.04k forks source link

Question: Why did you implemented LoRA on your hand instead of using peft? #830

Open kwonmha opened 11 months ago

kwonmha commented 11 months ago

Peft looks more convenient to integrate LoRA. Is it because of policy of your company or is there any reason?

EeyoreLee commented 11 months ago

@kwonmha - I thought it's just a not important choice, like someone perfer loguru, and another one perfer native logging. And if using peft package from HF, there're some trouble, like peft need a regex to match module name with less readable code or list the all full name that lose the universality. The second trouble is the gradient checkpoint, cause peft will warp the module which make the model lose the function like gradient_checkout_enable, then you must enable it once the base transformer model init that makes transformer model and v_head separated. DS team may perfer a model without any warpper, Just like it said in the comments of deepspeed initialize.