stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

Seg Fault during shuffling #88

Open nvthanhtu opened 7 years ago

nvthanhtu commented 7 years ago

hachiko@ubuntu:~/Desktop/Glove/GloVe-master$ ./shuffle -verbose 2 -memory 8.0 < cooccurrence.bin > cooccurrence.shuf.bin SHUFFLING COOCCURRENCES array size: 510027366 Shuffling by chunks: processed 0 lines.Segmentation fault (core dumped) I still be able to create vectors.txt while running ./glove but the result is far from accurate.

` Enter three words (EXIT to break): be was have
Word: be Position in vocabulary: 31 Word: was Position in vocabulary: 17 Word: have Position in vocabulary: 38

                           Word       Cosine distance

                        fleeing     0.400954

                           mond     0.391636

                  discordianism     0.389428

                    biochemical     0.368640

                          hobby     0.365726

                   polyrhythmic     0.364714

                     roosevelts     0.364072

`

nvthanhtu commented 7 years ago

The reason for this is memory. My VM dont have enough memory to rune shuffle (default is 4). I changed it into 1 and it worked.

dutkaD commented 6 years ago

The reason for this is memory. My VM dont have enough memory to rune shuffle (default is 4). I changed it into 1 and it worked.

Could you please tell what exactly did you change, what value and where?

PuJes commented 4 years ago

In demo.sh, change the arg of memory:

`#!/bin/bash set -e make if [ ! -e text8 ]; then if hash wget 2>/dev/null; then wget http://mattmahoney.net/dc/text8.zip else curl -O http://mattmahoney.net/dc/text8.zip fi unzip text8.zip rm text8.zip fi

CORPUS=text8 VOCAB_FILE=vocab.txt COOCCURRENCE_FILE=cooccurrence.bin COOCCURRENCE_SHUF_FILE=cooccurrence.shuf.bin BUILDDIR=build SAVE_FILE=vectors VERBOSE=2 MEMORY=4.0 VOCAB_MIN_COUNT=5 VECTOR_SIZE=50 MAX_ITER=15 WINDOW_SIZE=15 BINARY=2 NUM_THREADS=8 X_MAX=10 if hash python 2>/dev/null; then PYTHON=python else PYTHON=python3 fi

...