Why use additional multiplication of np.sqrt(attrs.shape[1]) in attribute normalization?

universome / class-norm

Class Normalization for Continual Zero-Shot Learning

34 stars 3 forks source link

Hi! We added this to avoid possible technical optimization issues that might arise from dividing attributes by a too large number which would squash their values: neural networks like it when their inputs are coming from N(0, I) and doing "a_c / ||a_c|| * sqrt(d_a)" in this case is conceptually similar to standardization. Attributes normalization on its own is not supposed to work for deep models and we left it their to be more consistent with the linear case.

I believe that (for deep models) it is possible to replace all this with simple standardization (i.e. doing (a_c - mean(a_c)) / std(a_c) — but I just tried this (without any other changes) and it worked poorly, so I suspect it is necessary to adjust the optimization hyperparams (learning rate, weight decay, scheduler, etc.) somehow. Also, there might be some other explanation but I will need to think about it.

P.S. Sorry for my late reply

universome / class-norm

Why use additional multiplication of np.sqrt(attrs.shape[1]) in attribute normalization? #3