stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

Adaptive gradient update bug? #51

Closed urialon closed 7 years ago

urialon commented 7 years ago

Hi, I'm a CS M.Sc. student from the Technion, Israel, researching applications of word vectors to programming language. Thanks for your code, it's great.

In glove.c, lines 136-144, are the values of temp1 and temp2 a mistake, or should it be like this?

// learning rate times gradient for word vectors temp1 = fdiff * W[b + l2]; temp2 = fdiff * W[b + l1]; // adaptive updates W_updates1[b] = temp1 / sqrt(gradsq[b + l1]); W_updates2[b] = temp2 / sqrt(gradsq[b + l2]); W_updates1_sum += W_updates1[b]; W_updates2_sum += W_updates2[b]; gradsq[b + l1] += temp1 * temp1; gradsq[b + l2] += temp2 * temp2;

temp1 gets the value of W in line l2: temp1 = fdiff * W[b +l2]; but then gradsq in line l1 is increased with temp1temp1: gradsq[b +l1`] += temp1 temp1;`

So - I think the values of temp1 and temp2 are swapped. What do you think? Thanks!

urialon commented 7 years ago

My mistake, deriving the GloVe objective function showed that the code is correct.