Closed chenbjin closed 7 years ago
你好,
你需要保证,语义限制集合中的所有词语,都在语料词典中。
刘权
在 2017年2月24日,20:08,bbking notifications@github.com 写道:
hello,请问如何训练词向量?假设现在要训练SWE+Synon-Anton,我尝试如下步骤:
机器配置:Ubuntu 14.04 128G RAM
提供wikipedia语料train.txt 将 semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton划分为sem.train.txt和sem.valid.txt 运行 ./SWE_Train -train train.txt -output vec.txt -size 200 -window 5 -sample 1e-4 -negative 5 -hs 0 -binary 0 -cbow 0 -iter 3 -sem-train sem.train.txt -sem-valid sem.valid.txt -sem-coeff 0.1 -sem-hinge 0.0 -sem-addtime 0 -weight-decay 0 -delta-left 1 -delta-right 1 出现问题: 读取语料后无法出现Segmentation fault!
log如下: Semantic Word Embedding (SWE) ToolkitTrain Setting embedding size: 200 Train Setting window size: 5 Train Setting sample value: 0.000100 Train Setting negative num: 5 Running Threads: 12 Iteration Times: 3 SemWE Qsem train file: ../semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton.train SemWE Qsem valid file: ../semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton.valid SemWE Add Time(/%): 0.000000 SemWE Weight Decay: 0.000000 SemWE Inter Coeff: 0.100000 SemWE Norm Hinge Margin: 0.000000 SemWE Inequation Delta Left: 1 SemWE Inequation Delta Right: 1
Training Starting @Time: Fri Feb 24 19:21:08 2017
Starting training using file wikicorpus.1b Vocab size: 218317 Words in train file: 123353508
Load Training Word Knowledge from file ../semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton.train --- InEquation Nums: 424732 --- Finish reading the Knowledge Database Load CV Test Word Knowledge from file ../semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton.valid --- CV set InEquation Nums: 1000 ./run.sh: line 5: 25479 Segmentation fault (core dumped) ./SWE_Train -train ${TRAIN_FILE} -output vec.bin -size 200 -window 5 -sample 1e-4 -negative 5 -hs 0 -binary 1 -cbow 0 -iter 3 -sem-train ${SEW_FILE} -sem-valid ${SEW_CV_FILE} -sem-coeff 0.1 -sem-hinge 0.0 -sem-addtime 0 -weight-decay 0 -delta-left 1 -delta-right 1
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
非常感谢! 另外想请问划分语义限制数据train和valid的比例?我想在该模型最优状态下做对比实验。
一般是取5%到20%的比例做开发集。
刘权
在 2017年2月24日,23:44,bbking notifications@github.com 写道:
非常感谢! 另外想请问划分语义限制数据train和valid的比例?我想在该模型最优状态下做对比实验。
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
好的,多谢指导!
补充:使用word2vec训练词向量正常。
hello,请问如何训练词向量?假设现在要训练SWE+Synon-Anton,我尝试如下步骤:
机器配置:Ubuntu 14.04 128G RAM
出现问题: 读取语料后出现Segmentation fault!
另外请问划分WordNet数据train和valid的比例?论文中并无提及
log如下: Semantic Word Embedding (SWE) ToolkitTrain Setting embedding size: 200 Train Setting window size: 5 Train Setting sample value: 0.000100 Train Setting negative num: 5 Running Threads: 12 Iteration Times: 3 SemWE Qsem train file: ../semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton.train SemWE Qsem valid file: ../semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton.valid SemWE Add Time(/%): 0.000000 SemWE Weight Decay: 0.000000 SemWE Inter Coeff: 0.100000 SemWE Norm Hinge Margin: 0.000000 SemWE Inequation Delta Left: 1 SemWE Inequation Delta Right: 1 Training Starting @Time: Fri Feb 24 19:21:08 2017
Starting training using file wikicorpus.1b Vocab size: 218317 Words in train file: 123353508 Load Training Word Knowledge from file ../semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton.train --- InEquation Nums: 424732 --- Finish reading the Knowledge Database Load CV Test Word Knowledge from file ../semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton.valid --- CV set InEquation Nums: 1000 ./run.sh: line 5: 25479 Segmentation fault (core dumped) ./SWE_Train -train ${TRAIN_FILE} -output vec.bin -size 200 -window 5 -sample 1e-4 -negative 5 -hs 0 -binary 1 -cbow 0 -iter 3 -sem-train ${SEW_FILE} -sem-valid ${SEW_CV_FILE} -sem-coeff 0.1 -sem-hinge 0.0 -sem-addtime 0 -weight-decay 0 -delta-left 1 -delta-right 1