Closed budhiraja closed 4 years ago
I get the same problem, but after training completes 100 iterations, and the model starts saving the vectors. Were you able to fix this?
We had some pull requests improving memory management recently. If anyone comes across a similar problem again, please reopen this bug or open a new one.
I am trying to train Glove on a high end server on a medium sized corpus but I am getting memory errors. Weird part: It comes after doing the first epoch. Here is the generated log:
BUILDING VOCABULARY Processed 62983105 tokens. Counted 2515374 unique words. Truncating vocabulary at min count 2. Using vocabulary of size 817196.
COUNTING COOCCURRENCES window size: 5 context: symmetric max product: 13752509 overflow length: 38028356 Reading vocab from file "/lustre/amar/glove_models/vocab.txt"...loaded 817196 words. Building lookup table...table contains 161333419 elements. Processed 62981822 tokens. Writing cooccurrences to disk...........3 files in total. Merging cooccurrence files: processed 66886759 lines.
SHUFFLING COOCCURRENCES array size: 255013683 Shuffling by chunks: processed 66886759 lines. Wrote 1 temporary file(s). Merging temp files: processed 66886759 lines.
TRAINING MODEL Read 66886759 lines. Initializing parameters...done. vector size: 50 vocab size: 817196 x_max: 10.000000 alpha: 0.750000 iter: 001, cost: 0.108257 glibc detected build/glove: double free or corruption (!prev): 0x00000000007e4540 *** ======= Backtrace: ========= /lib64/libc.so.6[0x3058875366] /lib64/libc.so.6[0x3058877e93] /lib64/libc.so.6(fclose+0x14d)[0x3058865add] build/glove[0x4021b7] build/glove[0x4025f0] build/glove[0x402d34] /lib64/libc.so.6(__libc_start_main+0xfd)[0x305881ecdd] build/glove[0x400db9] ======= Memory map: ======== 00400000-00404000 r-xp 00000000 00:15 1653944383 /home/iiit/amar/Bug2Vec/glove/build/glove 00603000-00604000 rw-p 00003000 00:15 1653944383 /home/iiit/amar/Bug2Vec/glove/build/glove 007e3000-00804000 rw-p 00000000 00:00 0 [heap] 3058000000-3058020000 r-xp 00000000 fd:00 783778 /lib64/ld-2.12.so 305821f000-3058220000 r--p 0001f000 fd:00 783778 /lib64/ld-2.12.so 3058220000-3058221000 rw-p 00020000 fd:00 783778 /lib64/ld-2.12.so 3058221000-3058222000 rw-p 00000000 00:00 0 3058800000-3058989000 r-xp 00000000 fd:00 783779 /lib64/libc-2.12.so 3058989000-3058b88000 ---p 00189000 fd:00 783779 /lib64/libc-2.12.so 3058b88000-3058b8c000 r--p 00188000 fd:00 783779 /lib64/libc-2.12.so 3058b8c000-3058b8d000 rw-p 0018c000 fd:00 783779 /lib64/libc-2.12.so 3058b8d000-3058b92000 rw-p 00000000 00:00 0 3058c00000-3058c17000 r-xp 00000000 fd:00 783781 /lib64/libpthread-2.12.so 3058c17000-3058e17000 ---p 00017000 fd:00 783781 /lib64/libpthread-2.12.so 3058e17000-3058e18000 r--p 00017000 fd:00 783781 /lib64/libpthread-2.12.so 3058e18000-3058e19000 rw-p 00018000 fd:00 783781 /lib64/libpthread-2.12.so 3058e19000-3058e1d000 rw-p 00000000 00:00 0 3059400000-3059483000 r-xp 00000000 fd:00 783785 /lib64/libm-2.12.so 3059483000-3059682000 ---p 00083000 fd:00 783785 /lib64/libm-2.12.so 3059682000-3059683000 r--p 00082000 fd:00 783785 /lib64/libm-2.12.so 3059683000-3059684000 rw-p 00083000 fd:00 783785 /lib64/libm-2.12.so 305a800000-305a816000 r-xp 00000000 fd:00 783788 /lib64/libgcc_s-4.4.6-20120305.so.1 305a816000-305aa15000 ---p 00016000 fd:00 783788 /lib64/libgcc_s-4.4.6-20120305.so.1 305aa15000-305aa16000 rw-p 00015000 fd:00 783788 /lib64/libgcc_s-4.4.6-20120305.so.1 7fc048000000-7fc048021000 rw-p 00000000 00:00 0 7fc048021000-7fc04c000000 ---p 00000000 00:00 0 7fc04e099000-7fc04e09a000 ---p 00000000 00:00 0 7fc04e09a000-7fc09e280000 rw-p 00000000 00:00 0 7fc09e298000-7fc09e29a000 rw-p 00000000 00:00 0 7fff01705000-7fff0171a000 rw-p 00000000 00:00 0 [stack] 7fff017ff000-7fff01800000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] ./demo.sh: line 43: 1678 Aborted (core dumped) $BUILDDIR/glove -save-file $SAVE_FILE -threads $NUM_THREADS -input-file $COOCCURRENCE_SHUF_FILE -x-max $X_MAX -iter $MAX_ITER -vector-size $VECTOR_SIZE -binary $BINARY -vocab-file $VOCAB_FILE -verbose $VERBOSE