tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
5.04k stars 327 forks source link

Why are the mean value of model weights all 0? #417

Open YoucanBaby opened 1 month ago

YoucanBaby commented 1 month ago

Dear author,

Thanks a lot for your great project.

Why is the mean of model weights all 0? Did you use any training tricks?

1723528894801

Best regards.

zheng-jiawen commented 1 month ago

Hello, I am also curious about it. Have you got any clue about this?

YoucanBaby commented 1 month ago

Hello, I am also curious about it. Have you got any clue about this?

Sorry, no clues so far.

jxl0131 commented 1 week ago

This is probably because there is a torch.nn.LayerNorm before the to_k_ip linear layer. Layernorm, which normalizes the input of a linear layer to a mean of 0. Let the input of this linear layer be x = [ x 1, x 2, ... XN ] , the output is y = [ y 1, y 2, ... y M]. Layernorm makes mean (x) = 0. During the gradient back propagation, y1 = x1x11 + x2w21 + ... xnwn1. The gradient of the layers after y 1 is represented by D, the sum of updates to the weight w11 to wn1 is sum (x1D + x2D + ... xnD) = mean(x) N D= 0. This means that the sum of the to_k_ip weights during training is the same as at the beginning of the training initialization, and i guess the sum of the weights at initialization is zero.