Caught NaN in diff for kdiff for thread. Skipping update

SlavaChicago commented 7 years ago

The exception "Caught NaN in diff for kdiff for thread. Skipping update" appears after 3-rd of 4-th iteration. It just keep printing it forever, and it seems like nothing is happening after it starts.

File is about 1b tokens of mostly Russian but also some English texts. GloVe works good with just a part of a file (about 300m tokens), that is a Russian Wikipedia corpus. The, we parse a lot of web pages, deleted all tags and code from them and appended that corpus to Russian wiki corpus, and with that file we getting an exception.

Why can it appear? Where should I look for a solution? Should I try to localize the part of the corpus where it breaks. Or maybe that corpus is too big for my server (the ram is actually enough, it is on 80% of RAM usage maximum when running)?

These are my settings: CORPUS=full_ru_texts3.txt VOCAB_FILE=vocab_full_ru_text_300_8.txt COOCCURRENCE_FILE=cooccurrence.bin COOCCURRENCE_SHUF_FILE=cooccurrence.shuf.bin BUILDDIR=build SAVE_FILE=vectors_full_ru_text_300_8.txt VERBOSE=2 MEMORY=4.0 VOCAB_MIN_COUNT=4 VECTOR_SIZE=300 MAX_ITER=50 WINDOW_SIZE=8 BINARY=2 NUM_THREADS=7 X_MAX=10

Thank you!

NiuQian commented 6 years ago

I get the same problem

SeyedMST commented 5 years ago

see this: https://github.com/stanfordnlp/GloVe/issues/50

AngledLuffa commented 4 years ago

Aside from the other linked issues, another possibility is we recently had a pull request which fixed some NaNs in training by clipping the gradients. Hopefully it is now resolved.

stanfordnlp / GloVe

Caught NaN in diff for kdiff for thread. Skipping update #90