Layer normalization w/ or w/o affine parameters

yash0307 / RecallatK_surrogate

Code for Recall@k Surrogate Loss with Large Batches and Similarity Mixup, CVPR 2022.

MIT License

56 stars 8 forks source link

Hi @yash0307,

I have another question about layer normalization applied in the model here.

The paper states that layer normalization is used similarly to the ProxyNCA++ method, where layer normalization does not have trainable affine parameters. However, in your code, by default, torch.nn.LayerNorm(self.model.head.in_features) will have trainable affine parameters enabled.

Is there any reason why the trainable affine parameters are enabled here? Have you noticed any performance between the cases w/ and w/o trainable affine parameters in layer normalization?

Thanks.

yash0307 / RecallatK_surrogate

Layer normalization w/ or w/o affine parameters #7