stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.81k stars 1.51k forks source link

memory issue on training glove #198

Closed lfoppiano closed 2 years ago

lfoppiano commented 2 years ago

After several attempts, I reach the part of the demo.sh where it's training and dumping the model. Unfortunately I get the following error:

(base) [lfoppian0@sakura02 GloVe]$ build/glove -write-header 1 -save-file vectors -threads 10 -input-file cooccurrence.shuf.bin -x-max 10 -iter 100 -vector-size 300 -binary 2 -vocab-file vocab.txt -verbose 0
TRAINING MODEL
Read 128517745517 lines.
Using random seed 1633594412
Error allocating memory for W

The command glove does not have the -memory option, so I wonder if this is due to the fact that my shuffled coreference file is 1.2 TB...

Any clue or suggestion is welcome.

Thanks in advance

AngledLuffa commented 2 years ago

That does seem quite large. It looks like glove just uses as much memory as it can get. Based on that error line, I think it read the whole file but then couldn't allocate W. How about pruning some number of the lines or some of the words? I don't know how to pick the best ones to prune, though.