Closed MaroonAmor closed 1 year ago
Hi, we did not experiment with the trainable affine parameters disabled, so can not comment if it will make a difference in the performance. The comment in the paper is to highlight that other related papers such as ProxyNCA++ use layernorm as well.
Hi @yash0307,
I have another question about layer normalization applied in the model here.
The paper states that layer normalization is used similarly to the ProxyNCA++ method, where layer normalization does not have trainable affine parameters. However, in your code, by default,
torch.nn.LayerNorm(self.model.head.in_features)
will have trainable affine parameters enabled.Is there any reason why the trainable affine parameters are enabled here? Have you noticed any performance between the cases w/ and w/o trainable affine parameters in layer normalization?
Thanks.