Open zplovekq opened 4 years ago
bias = np.sqrt(3.0 / embeddings.size(1)) torch.nn.init.uniform_(embeddings, -bias, bias)
This is lecun_uniform
way of initializing; here, the fan_in
is the units of size emb_dim
which is obtained using embeddins.size(1)
as in the code.
The code samples (i.e. picks) value uniformly in the interval (-bias
, +bias
) where bias
is defined as in the code sqrt(3.0 / emb_dim)
and the pytorch init is
init.normal_(self.weight)
why to do this and what is the refer?
Well, there is a whole area of research about why some initializations are better when compared to just initializing by sampling values from a simple uniform or gaussian distribution. Some initializations are found to be empirically better such as lecun_uniform
.
Here is one c.f.: initializers/lecun_uniform
and the pytorch init is
init.normal_(self.weight)
why to do this and what is the refer? look forward for discuss