Closed hieuddo closed 4 years ago
For unknown words, we initialise the word embedding vector with random values, with 300
dimensions. These values are sampled from a multivariate Gaussian, which is defined by the mean and std of the candidate embedding values. This initialisation method is one of many options.
For instance, we could also sample Gaussian noise in the interval [-0.5, 0.5]
, or from any other probability distribution.
However, it is common to use a Gaussian distribution for that.
Would you explain about the embedding process? Why the process goes like following code?