stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.82k stars 1.51k forks source link

Segfault when running build/glove #63

Open dchaplinsky opened 7 years ago

dchaplinsky commented 7 years ago
$ time build/glove -save-file gigamega.cased.tokenized.glove.600d -threads 32 -input-file /mnt/cooccurrence.shuf.bin -x-max 40 -iter 10 -vector-size 600 -vocab-file /mnt/vocab.txt -verbose 3
TRAINING MODEL
Read 1325708976 lines.
Initializing parameters...done.
vector size: 600
vocab size: 3448322
x_max: 40.000000
alpha: 0.750000
Segmentation fault

Any clues? Anything I can debug? Size of the corpus is 67GB, all scripts prior to glove worked like a charm. Box has 128GB of memory and 32 cores

Latest version of GloVe downloaded from http://nlp.stanford.edu/projects/glove/

dchaplinsky commented 7 years ago

Hmm, works with latest code from github.

Maybe it's time to update the page?