stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

cost: -nan #156

Closed tanya-ling closed 4 years ago

tanya-ling commented 4 years ago

I encounter cost: -nan after a few iterations (from 5 to 7) on my own dataset of 39 Gb. The vectors are also all nans.

I have read this issue, but decreasing learning rate doesn't solve my problem (I tried 0.05, 0.005, 0.0005 and 0.0001, the same results).

The cost is decreasing for a few steps, but then goes to NaN. Like this: glove

I have a really small vocabulary size of less than 5000 (that's intentional, I pretokenized my corpus this way) and a large vector size (tried 500 and 1000).

Cooccurrence matrix seems to be constructed fine and weights about 396 Mb. vocab file also looks good. And even resulted vectors after a few iterations (before -nan appears) seem good and do not completely fail on lexical similarity task.

However, I would like to continue training, since I am not sure that the model has converged.

Please, give some advice about how to avoid -nan in cost and vectors.

I installed glove with $ git clone http://github.com/stanfordnlp/glove $ cd glove && make if it matters.

AngledLuffa commented 4 years ago

I merged in a change which clips the gradients so they don't go infinite. Hopefully that resolves the nans.