nbasyl / DoRA

Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
https://arxiv.org/abs/2402.09353
Other
122 stars 4 forks source link

I find some confusion code in pefy #2

Open guokan987 opened 8 months ago

guokan987 commented 8 months ago

code: result_dora = (mag_norm_scale - 1) (F.linear(x, transpose(weight, self.fan_in_fan_out)) ) + mag_norm_scale lora_B(lora_A(x)) * scaling Question: what is the effect of (mag_norm_scale - 1) and mag_norm_scale ? And, result_dora can't equals the F.linear(x, transpose(weight, self.fan_in_fan_out)) in the Initializing stage due to the parameter "mag_norm_scale - 1"

nbasyl commented 8 months ago

Hi, you can refer to this formula: image

guokan987 commented 8 months ago

Thanks authors, but I can't understand why use the follow equation: Training: XW_0+dropout(X)(m/norm(V+deltaV))*(V+deltaV), the effect of image is unclear

nbasyl commented 8 months ago

Note that (mag_norm_scale - 1) * (F.linear(x, transpose(weight, self.fan_in_fan_out))) must be included to properly apply dropout; otherwise, the outcome would be inaccurate. You can refer to https://github.com/huggingface/peft/pull/1474 where we discuss this.