stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

Negative Nan cost for every iteration #85

Closed queirozfcom closed 7 years ago

queirozfcom commented 7 years ago

Hi. After processing a large (30G) input file, I tried training Glove on it but I ran into a few problems. Every epoch outputs -nan as the cost. I've pasted all the relevant output below:

(train-file.sh is just a modified version of demo.sh using my input file instead of text8)

ubuntu@ip-172-30-4-85:~/GloVe$ ./train-file.sh 
$ build/vocab_count -min-count 20 -verbose 2 < Full-File-Tokenized-Single-Line-No-unks.txt > vocab.txt
BUILDING VOCABULARY
Processed 6976519851 tokens.
Counted 36371606 unique words.
Truncating vocabulary at min count 20.
Using vocabulary of size 1817927.

$ build/cooccur -memory 6.0 -vocab-file vocab.txt -verbose 2 -window-size 15 < Full-File-Tokenized-Single-Line-No-unks.txt > cooccurrence.bin
COUNTING COOCCURRENCES
window size: 15
context: symmetric
max product: 20163704
overflow length: 57042534
Reading vocab from file "vocab.txt"...loaded 1817927 words.
Building lookup table...table contains 260465759 elements.
Processed 6976514733 tokens.
Writing cooccurrences to disk...........116 files in total.
Merging cooccurrence files: processed 1209229520 lines.

$ build/shuffle -memory 6.0 -verbose 2 < cooccurrence.bin > cooccurrence.shuf.bin
SHUFFLING COOCCURRENCES
array size: 382520524
Shuffling by chunks: processed 0 lines.
Wrote 1 temporary file(s).
Merging temp files: processed 0 lines.

$ build/glove -save-file vectors -threads 8 -input-file cooccurrence.shuf.bin -x-max 10 -iter 15 -vector-size 50 -binary 2 -vocab-file vocab.txt -verbose 2
TRAINING MODEL
Read 0 lines.
Initializing parameters...done.
vector size: 50
vocab size: 1817927
x_max: 10.000000
alpha: 0.750000
06/10/17 - 12:16.17AM, iter: 001, cost: -nan
06/10/17 - 12:16.17AM, iter: 002, cost: -nan
06/10/17 - 12:16.17AM, iter: 003, cost: -nan
06/10/17 - 12:16.17AM, iter: 004, cost: -nan
06/10/17 - 12:16.17AM, iter: 005, cost: -nan
06/10/17 - 12:16.17AM, iter: 006, cost: -nan
06/10/17 - 12:16.17AM, iter: 007, cost: -nan
06/10/17 - 12:16.17AM, iter: 008, cost: -nan
06/10/17 - 12:16.17AM, iter: 009, cost: -nan
06/10/17 - 12:16.17AM, iter: 010, cost: -nan
06/10/17 - 12:16.17AM, iter: 011, cost: -nan
06/10/17 - 12:16.17AM, iter: 012, cost: -nan
06/10/17 - 12:16.17AM, iter: 013, cost: -nan
06/10/17 - 12:16.17AM, iter: 014, cost: -nan
06/10/17 - 12:16.17AM, iter: 015, cost: -nan

Things I've noticed:

Other info:

Things I've tried

Any thoughts about this?

queirozfcom commented 7 years ago

Closing this as these errors went away when I ran the program on a machine with more memory and more disk space (there are many temporary files written to disk during the preprocessing tasks).