yash0307 / RecallatK_surrogate

Code for Recall@k Surrogate Loss with Large Batches and Similarity Mixup, CVPR 2022.
MIT License
56 stars 8 forks source link

Layer normalization w/ or w/o affine parameters #7

Closed MaroonAmor closed 1 year ago

MaroonAmor commented 2 years ago

Hi @yash0307,

I have another question about layer normalization applied in the model here.

The paper states that layer normalization is used similarly to the ProxyNCA++ method, where layer normalization does not have trainable affine parameters. However, in your code, by default, torch.nn.LayerNorm(self.model.head.in_features) will have trainable affine parameters enabled.

Is there any reason why the trainable affine parameters are enabled here? Have you noticed any performance between the cases w/ and w/o trainable affine parameters in layer normalization?

Thanks.

yash0307 commented 2 years ago

Hi, we did not experiment with the trainable affine parameters disabled, so can not comment if it will make a difference in the performance. The comment in the paper is to highlight that other related papers such as ProxyNCA++ use layernorm as well.