SWE Toolkit. Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints. A general framework to incorporate semantic knowledge into the popular data-driven learning process of word vectors. Applications including word similarity, sentence completion, etc. ACL-2015, Beijing, China
The ./semantics/knowledge_constraints.train and ./semantics/knowledge_constraints.valid are the same file as SemWE.EN.KnowDB.COM1.inTEXT8.train and SemWE.EN.KnowDB.COM1.inTEXT8.valid in semantics/TEXT8 directory.
When I run the SWE_Train using the following command:
./SWE/bin/SWE_Train -debug 2 -size 100 -train ./corpora/corpus.txt -read-vocab ./corpora/vocabulary.txt -cbow 0 -hs 0 -alpha 0.025 -window 5 -sample 0.0001 -negative 5 -threads 1 -output ./word_embed.txt -sem-coeff 0.005 -sem-addtime 0 -sem-hinge 0.0 -weight-decay 0.0 -sem-train ./semantics/knowledge_constraints.train -sem-valid ./semantics/knowledge_constraints.valid -iter 2
The ./semantics/knowledge_constraints.train and ./semantics/knowledge_constraints.valid are the same file as SemWE.EN.KnowDB.COM1.inTEXT8.train and SemWE.EN.KnowDB.COM1.inTEXT8.valid in semantics/TEXT8 directory.
The output I got is:
Semantic Word Embedding (SWE) ToolkitTrain Setting embedding size: 100 Train Setting window size: 5 Train Setting sample value: 0.000100 Train Setting negative num: 5 Running Threads: 1 Iteration Times: 2 SemWE Qsem train file: ./semantics/knowledge_constraints.train SemWE Qsem valid file: ./semantics/knowledge_constraints.valid SemWE Add Time(/%): 0.000000 SemWE Weight Decay: 0.000000 SemWE Inter Coeff: 0.005000 SemWE Norm Hinge Margin: 0.000000 SemWE Inequation Delta Left: 1 SemWE Inequation Delta Right: 1
Starting training using file ./corpora/corpus.txt Vocab size: 47091 Words in train file: 9614559
Did I use the SWE_Train incorrectly? Btw, I suggest you provide a documentation for explaining how to use the SWE_Train file.
Thanks for your help