wlin12 / wang2vec

Extension of the original word2vec using different architectures
Apache License 2.0
210 stars 49 forks source link

Segmentation fault cbow size 600 or more #8

Open CShulby opened 7 years ago

CShulby commented 7 years ago

I get a segmentation fault with high dimensions (600 or more) using cbow. A normal word2vec runs fine for this size but wang2vec does not. I am able to run wang2vec with skip.

here is the error:

line 1: 18929 Segmentation fault ./word2vec -train final.txt -output cbow_600 -size 600 -binary 1 -type 2

and the output from my log:

Starting training using file final.txt Vocab size: 934966 Words in train file: 1461491292 Alpha: 0.047882 Progress: 4.24% Words/thread/sec: 15.22k

wlin12 commented 7 years ago

Hi,

Have you tried setting -cap 5 or some other value? People have reported that cwindow can be subject to exploding gradients for some datasets and large embedding sizes. Setting -cap will cap the gradient to that value.

Cheers, Wang Ling

On May 20, 2017, at 9:29 PM, Cshulby notifications@github.com wrote:

I get a segmentation fault with high dimensions (600 or more) using cbow. A normal word2vec runs fine for this size but wang2vec does not. I am able to run wang2vec with skip.

here is the error:

line 1: 18929 Segmentation fault ./word2vec -train final.txt -output cbow_600 -size 600 -binary 1 -type 2

and the output from my log:

Starting training using file final.txt Vocab size: 934966 Words in train file: 1461491292 Alpha: 0.047882 Progress: 4.24% Words/thread/sec: 15.22k

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/wlin12/wang2vec/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/ACKTVhN5enkU_GnFQAiP6iDta7ZXYI2Vks5r700xgaJpZM4NhZ99.