Why are the mean value of model weights all 0?

YoucanBaby commented 1 month ago

Dear author,

Thanks a lot for your great project.

Why is the mean of model weights all 0? Did you use any training tricks?

Best regards.

zheng-jiawen commented 1 month ago

Hello, I am also curious about it. Have you got any clue about this?

YoucanBaby commented 1 month ago

Hello, I am also curious about it. Have you got any clue about this?

Sorry, no clues so far.

jxl0131 commented 1 week ago

This is probably because there is a torch.nn.LayerNorm before the to_k_ip linear layer. Layernorm, which normalizes the input of a linear layer to a mean of 0. Let the input of this linear layer be x = [ x 1, x 2, ... XN ] , the output is y = [ y 1, y 2, ... y M]. Layernorm makes mean (x) = 0. During the gradient back propagation, y1 = x1x11 + x2w21 + ... xnwn1. The gradient of the layers after y 1 is represented by D, the sum of updates to the weight w11 to wn1 is sum (x1D + x2D + ... xnD) = mean(x) N D= 0. This means that the sum of the to_k_ip weights during training is the same as at the beginning of the training initialization, and i guess the sum of the weights at initialization is zero.

tencent-ailab / IP-Adapter

Why are the mean value of model weights all 0? #417