Closed hhnqqq closed 4 days ago
import numpy as np
def inv_train(rate,x,w1,b1,w2,b2):
layer1=np.maximum(0,np.dot(w1,x)+b1)
mask1=np.random.binomial(1,1-rate,layer1.shape)# random.binomial(n, p, size=None)
layer1=layer1*mask1
layer1/=1-rate
layer2=np.maximum(0,np.dot(w2,layer1)+b2)
mask2=np.random.binomial(1,1-rate,layer2.shape)# random.binomial(n, p, size=None)
layer2=layer2*mask2
layer2/=1-rate
return layer2
def inv_test(x,w1,b1,w2,b2):
layer1=np.maximum(0,np.dot(w1,x)+b1)
layer2=np.maximum(0,np.dot(w2,layer1)+b2)
return layer2
Actually do not need to re-scale the lora weight, my bad
Thanks for bringing this up and then investigating this further. I think this is solved, which is why I'm closing the issue. If there are still open questions, feel free to re-open the issue.
Description:
The Dropout layer behaves differently during training and evaluation, leading to inconsistent behavior in the lora_A module. Specifically, during training, dropout(x) will drop some outputs of XA, whereas during evaluation, dropout is disabled and a scaling factor of 1-p is multiplied with XA. To ensure that the merged LoRA weights exhibit the same behavior as the un-merged LoRA weights, the dropout scaling should be applied within the merge method as well. Currently, it appears that PEFT does not handle this discrepancy.
Question
Is there any special handling for dropout scaling in the PEFT library that I might have missed?
Related code
Merge method
Compute lora weight method
Forward method of a lora layer