stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

Cannot reproduce Numbers in paper for Word Similarity #108

Open joeybose opened 6 years ago

joeybose commented 6 years ago

I've tried for a very long time to take the released 6B token 300D word vectors to reproduce numbers as reported in the Glove Paper. But I simply cannot get the right number for the word similarity task on RW and WS353 datasets. There is one question, first how do you deal with the case if one of the words in the pair is OOV, do you compute the cosine distance as 0 or remove the sample completely. Regardless, without removing OOV words my score on RW dataset is 34.21, paper reports 38.1 EXCLUDING OOV score is 41.09. On WS353 my score 60.85. I have normalized the embeddings like your evaluation code for word analogies and used scipy.stats to compute spearman score. It would be great if I could reproduce the results reported on the paper.