The exception "Caught NaN in diff for kdiff for thread. Skipping update" appears after 3-rd of 4-th iteration. It just keep printing it forever, and it seems like nothing is happening after it starts.
File is about 1b tokens of mostly Russian but also some English texts.
GloVe works good with just a part of a file (about 300m tokens), that is a Russian Wikipedia corpus.
The, we parse a lot of web pages, deleted all tags and code from them and appended that corpus to Russian wiki corpus, and with that file we getting an exception.
Why can it appear? Where should I look for a solution? Should I try to localize the part of the corpus where it breaks. Or maybe that corpus is too big for my server (the ram is actually enough, it is on 80% of RAM usage maximum when running)?
These are my settings:
CORPUS=full_ru_texts3.txt
VOCAB_FILE=vocab_full_ru_text_300_8.txt
COOCCURRENCE_FILE=cooccurrence.bin
COOCCURRENCE_SHUF_FILE=cooccurrence.shuf.bin
BUILDDIR=build
SAVE_FILE=vectors_full_ru_text_300_8.txt
VERBOSE=2
MEMORY=4.0
VOCAB_MIN_COUNT=4
VECTOR_SIZE=300
MAX_ITER=50
WINDOW_SIZE=8
BINARY=2
NUM_THREADS=7
X_MAX=10
Aside from the other linked issues, another possibility is we recently had a pull request which fixed some NaNs in training by clipping the gradients. Hopefully it is now resolved.
The exception "Caught NaN in diff for kdiff for thread. Skipping update" appears after 3-rd of 4-th iteration. It just keep printing it forever, and it seems like nothing is happening after it starts.
File is about 1b tokens of mostly Russian but also some English texts. GloVe works good with just a part of a file (about 300m tokens), that is a Russian Wikipedia corpus. The, we parse a lot of web pages, deleted all tags and code from them and appended that corpus to Russian wiki corpus, and with that file we getting an exception.
Why can it appear? Where should I look for a solution? Should I try to localize the part of the corpus where it breaks. Or maybe that corpus is too big for my server (the ram is actually enough, it is on 80% of RAM usage maximum when running)?
These are my settings: CORPUS=full_ru_texts3.txt VOCAB_FILE=vocab_full_ru_text_300_8.txt COOCCURRENCE_FILE=cooccurrence.bin COOCCURRENCE_SHUF_FILE=cooccurrence.shuf.bin BUILDDIR=build SAVE_FILE=vectors_full_ru_text_300_8.txt VERBOSE=2 MEMORY=4.0 VOCAB_MIN_COUNT=4 VECTOR_SIZE=300 MAX_ITER=50 WINDOW_SIZE=8 BINARY=2 NUM_THREADS=7 X_MAX=10
Thank you!