Closed hrzafer closed 8 years ago
Thanks for reporting these. Will try to reproduce.
I also get this error on virtual xubuntu 14.04 in addition to the compiler warnings. Linux Mint 17.1 on VirtualBox compiles with no warning.
FYI actually I'm a Windows user and thankfully I managed to compile and execute Glove with Cygwin (on Windows 10) without any problems.
Sorry for the slow reply. Meant to respond to this earlier after merging https://github.com/stanfordnlp/GloVe/pull/3. Do you mind trying this again and seeing if you get a segmentation fault after the fix? Perhaps if you do you could try running with valgrind --tool=memcheck $BUILDDIR/shuffle ...
and paste the result?
I'm going to assume this was solved by our memory pull request fixes. Please reopen if you are able to reproduce on the most recent version.
Sorry for late feedback. I use virtualbox 5.0.10, windows 10 as host, LinuxMint 17.1 (2 GB RAM, gcc version: 4.8) as guest. I just downloaded the latest version as zip and extracted. This is the my terminal output:
harun@harun-mint ~/Desktop/GloVe-master $ make
mkdir -p build
gcc src/glove.c -o build/glove -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
gcc src/shuffle.c -o build/shuffle -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
gcc src/cooccur.c -o build/cooccur -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
gcc src/vocab_count.c -o build/vocab_count -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
harun@harun-mint ~/Desktop/GloVe-master $ ./demo.sh
mkdir -p build
gcc src/glove.c -o build/glove -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
gcc src/shuffle.c -o build/shuffle -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
gcc src/cooccur.c -o build/cooccur -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
gcc src/vocab_count.c -o build/vocab_count -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
--2015-11-22 14:24:24-- http://mattmahoney.net/dc/text8.zip
Resolving mattmahoney.net (mattmahoney.net)... 98.139.135.129
Connecting to mattmahoney.net (mattmahoney.net)|98.139.135.129|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31344016 (30M) [application/zip]
Saving to: ‘text8.zip’
100%[======================================>] 31.344.016 678KB/s in 60s
2015-11-22 14:25:25 (508 KB/s) - ‘text8.zip’ saved [31344016/31344016]
Archive: text8.zip
inflating: text8
BUILDING VOCABULARY
Processed 17005207 tokens.
Counted 253854 unique words.
Truncating vocabulary at min count 5.
Using vocabulary of size 71290.
COUNTING COOCCURRENCES
window size: 15
context: symmetric
max product: 13752509
overflow length: 38028356
Reading vocab from file "vocab.txt"...loaded 71290 words.
Building lookup table...table contains 94990279 elements.
Processing token: 15100000./demo.sh: line 55: 2710 Killed $BUILDDIR/cooccur -memory $MEMORY -vocab-file $VOCAB_FILE -verbose $VERBOSE -window-size $WINDOW_SIZE < $CORPUS > $COOCCURRENCE_FILE
I increase the main memory to 4 GB and retry:
harun@harun-mint ~/Desktop/GloVe-master $ ./demo.sh
mkdir -p build
gcc src/glove.c -o build/glove -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
gcc src/shuffle.c -o build/shuffle -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
gcc src/cooccur.c -o build/cooccur -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
gcc src/vocab_count.c -o build/vocab_count -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
BUILDING VOCABULARY
Processed 17005207 tokens.
Counted 253854 unique words.
Truncating vocabulary at min count 5.
Using vocabulary of size 71290.
COUNTING COOCCURRENCES
window size: 15
context: symmetric
max product: 13752509
overflow length: 38028356
Reading vocab from file "vocab.txt"...loaded 71290 words.
Building lookup table...table contains 94990279 elements.
Processed 17005206 tokens.
Writing cooccurrences to disk.........2 files in total.
Merging cooccurrence files: processed 60666466 lines.
SHUFFLING COOCCURRENCES
array size: 255013683
Shuffling by chunks: processed 0 lines../demo.sh: line 55: 2294 Segmentation fault $BUILDDIR/shuffle -memory $MEMORY -verbose $VERBOSE < $COOCCURRENCE_FILE > $COOCCURRENCE_SHUF_FILE
Hi, Did you manage to solve the problem? I have the same issue with segmentation fault. Thanks!
@asnatm I'm currently working on a separate segfault issue in build/glove in https://github.com/stanfordnlp/GloVe/issues/14. Can you clarify the circumstances under which you're getting the segfault (i.e. in glove.cpp or shuffle.cpp, and using which corpus).
I'm getting the segfault when running the demo script. At the shuffling by chunks step (shuffle.cpp) I'm using my own corpus. A small one only 23405 words in vocab. thanks!
I understand that you may not be able to share the corpus. But it's pretty difficult for me to help if I can't reproduce locally. Could you try cutting the corpus size in half repeatedly until you don't get the problem any more, and then sending me the smallest such corpus, perhaps with words substituted out for numbers representing their indices in the vocab?
Hi,
This is a link to the corpus https://www.dropbox.com/s/vqa6ogtxipy2lvz/myCorpus.txt?dl=0 (too large to attach) .Is the format ok? I'll do the exercise you suggested and send you the smallest corpus I still got the problem.
Highly appreciated,
Thanks, Asi
On Tue, Feb 16, 2016 at 6:54 AM, Russell Stewart notifications@github.com wrote:
I understand that you may not be able to share the corpus. But it's pretty difficult for me to help if I can't reproduce locally. Could you try cutting the corpus size in half repeatedly until you don't get the problem any more, and then sending me the smallest such corpus, perhaps with words substituted out for numbers representing their indices?
— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/GloVe/issues/6#issuecomment-184518457.
Hi,
I solved it. I didn't have enough memory.... sorry
Thanks, Asi
On Tue, Feb 16, 2016 at 10:00 AM, Asi Messica asi.messica@gmail.com wrote:
Hi,
This is a link to the corpus https://www.dropbox.com/s/vqa6ogtxipy2lvz/myCorpus.txt?dl=0 (too large to attach) .Is the format ok? I'll do the exercise you suggested and send you the smallest corpus I still got the problem.
Highly appreciated,
Thanks, Asi
On Tue, Feb 16, 2016 at 6:54 AM, Russell Stewart <notifications@github.com
wrote:
I understand that you may not be able to share the corpus. But it's pretty difficult for me to help if I can't reproduce locally. Could you try cutting the corpus size in half repeatedly until you don't get the problem any more, and then sending me the smallest such corpus, perhaps with words substituted out for numbers representing their indices?
— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/GloVe/issues/6#issuecomment-184518457.
Might sound stupid, but for me, it solved the problem just by restarting the computer :snail:
Hi all. I faced up with some problem. The problem is the memory you can adjust processing memory according to your computer memory.
bummer
On Sun, Oct 6, 2019 at 11:22 PM hashem yousefi notifications@github.com wrote:
Hi all. I faced up with some problem. The problem is the memory you can adjust processing memory according to your computer memory.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/GloVe/issues/6?email_source=notifications&email_token=AAIFEMIVNF22GQJ5AUWIDCLQNLIRNA5CNFSM4BTOWYAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAPETMI#issuecomment-538855857, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIFEMPNEIXKU2DHOMK6DC3QNLIRNANCNFSM4BTOWYAA .
The demo.sh fails on a virtual Linux Mint 17.1 (based on Ubuntu 14.04) on VirtualBox (with 4GB ram).