init_embedding - Githubissues

bias = np.sqrt(3.0 / embeddings.size(1))
torch.nn.init.uniform_(embeddings, -bias, bias)

This is lecun_uniform way of initializing; here, the fan_in is the units of size emb_dim which is obtained using embeddins.size(1) as in the code.

The code samples (i.e. picks) value uniformly in the interval (-bias, +bias) where bias is defined as in the code sqrt(3.0 / emb_dim)

and the pytorch init is init.normal_(self.weight) why to do this and what is the refer?

Well, there is a whole area of research about why some initializations are better when compared to just initializing by sampling values from a simple uniform or gaussian distribution. Some initializations are found to be empirically better such as lecun_uniform.

Here is one c.f.: initializers/lecun_uniform

sgrvinod / a-PyTorch-Tutorial-to-Image-Captioning

init_embedding #91