Closed grondilu closed 4 years ago
The proposed initialization method is derived independently under different assumptions, for a different nonlinearity, and with a different derivation than Xavier initialization. The assumptions made in Glorot et al. 2010 do not apply to the sine nonlinearity ("Consider the hypothesis that we are in a linear regime at the initialization..."). Remarkably, the final result turns out to be the same, but that was neither known, trivial, nor clear before our analysis - so indeed, this is a surprising finding of our paper!
I've just found out about the so-called Xavier initialization method, which to me seems to do what the paper claims to be its main contribution.
It's an option to Mathematica's NetInitialize function.