Intuition of word embedding

wuch15 / KDD-NPA

Resources for the paper "NPA: News Recommendation with Personalized Attention"

47 stars 12 forks source link

Intuition of word embedding #2

Closed hieuddo closed 4 years ago

hieuddo commented 4 years ago

Would you explain about the embedding process? Why the process goes like following code?

cand=np.array(cand,dtype='float32')
mu=np.mean(cand, axis=0)
Sigma=np.cov(cand.T)
norm=np.random.multivariate_normal(mu, Sigma, 1)
for i in range(len(embedding_matrix)):
    if type(embedding_matrix[i])==int:
        embedding_matrix[i]=np.reshape(norm, 300)

Curlykonda commented 4 years ago

For unknown words, we initialise the word embedding vector with random values, with 300 dimensions. These values are sampled from a multivariate Gaussian, which is defined by the mean and std of the candidate embedding values. This initialisation method is one of many options. For instance, we could also sample Gaussian noise in the interval [-0.5, 0.5], or from any other probability distribution. However, it is common to use a Gaussian distribution for that.