Predicted vectors not normalized?

Just came across this, too. Each word vector in the word vector matrix is normalized to unit norm in the following piece of code:

# normalize each word vector to unit variance
W_norm = np.zeros(W.shape)
d = (np.sum(W ** 2, 1) ** (0.5))
W_norm = (W.T / d).T

The in-line comment is a bit misleading since the normalization here concerns the rows' norm and not their variance. Just submitted a tiny PR changing this: https://github.com/stanfordnlp/GloVe/pull/86

So that should mean that the dot product in the following should be equivalent to cosine similarity:

#cosine similarity if input W has been normalized
dist = np.dot(W, pred_vec.T)

Did you come across more such issues where the similarity is outside [-1,1] since your last comment?

stanfordnlp / GloVe

Predicted vectors not normalized? #76