iunderstand / SWE

SWE Toolkit. Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints. A general framework to incorporate semantic knowledge into the popular data-driven learning process of word vectors. Applications including word similarity, sentence completion, etc. ACL-2015, Beijing, China
Apache License 2.0
51 stars 12 forks source link

Segmentation Fault for SWE_Train #3

Open shawnspace opened 6 years ago

shawnspace commented 6 years ago

When I run the SWE_Train using the following command:

./SWE/bin/SWE_Train -debug 2 -size 100 -train ./corpora/corpus.txt -read-vocab ./corpora/vocabulary.txt -cbow 0 -hs 0 -alpha 0.025 -window 5 -sample 0.0001 -negative 5 -threads 1 -output ./word_embed.txt -sem-coeff 0.005 -sem-addtime 0 -sem-hinge 0.0 -weight-decay 0.0 -sem-train ./semantics/knowledge_constraints.train -sem-valid ./semantics/knowledge_constraints.valid -iter 2

The ./semantics/knowledge_constraints.train and ./semantics/knowledge_constraints.valid are the same file as SemWE.EN.KnowDB.COM1.inTEXT8.train and SemWE.EN.KnowDB.COM1.inTEXT8.valid in semantics/TEXT8 directory.

The output I got is:

Semantic Word Embedding (SWE) ToolkitTrain Setting embedding size: 100 Train Setting window size: 5 Train Setting sample value: 0.000100 Train Setting negative num: 5 Running Threads: 1 Iteration Times: 2 SemWE Qsem train file: ./semantics/knowledge_constraints.train SemWE Qsem valid file: ./semantics/knowledge_constraints.valid SemWE Add Time(/%): 0.000000 SemWE Weight Decay: 0.000000 SemWE Inter Coeff: 0.005000 SemWE Norm Hinge Margin: 0.000000 SemWE Inequation Delta Left: 1 SemWE Inequation Delta Right: 1

Training Starting @Time: Sat Nov 11 12:04:57 2017

Starting training using file ./corpora/corpus.txt Vocab size: 47091 Words in train file: 9614559

Load Training Word Knowledge from file ./semantics/knowledge_constraints.train --- InEquation Nums: 324817 --- Finish reading the Knowledge Database Load CV Test Word Knowledge from file ./semantics/knowledge_constraints.valid --- CV set InEquation Nums: 2999 --- Finish reading the CV Knowledge Database --- Alpha: 0.025000 Progress: 0.00% WordCount: 0 Train_Qsem: inf Train_SatisfyRate: 0.0000 Valid_Qsem: inf Valid_SatisfyRate: 0.0000 Segmentation fault

Did I use the SWE_Train incorrectly? Btw, I suggest you provide a documentation for explaining how to use the SWE_Train file.

Thanks for your help

leye7755 commented 6 years ago

@shawnspace @iunderstand hello,I meet the problem same as you. Did you reslove it? Thank you

iunderstand commented 6 years ago

thanks!

you need to make sure that all the words in the inequalities are contained by your vocabulary.