hankcs / multi-criteria-cws

Simple Solution for Multi-Criteria Chinese Word Segmentation
http://www.hankcs.com/nlp/segment/multi-criteria-cws.html
GNU General Public License v3.0
300 stars 84 forks source link

预训练word-embedding来源 #8

Open zjuwfz opened 5 years ago

zjuwfz commented 5 years ago

您好,我最近在做bilstm-crf分词实验,使用了您项目中预训练的word-embedding之后结果提升了两个点。所以想问一下您的word-embedding来源是哪,还是自己训练的?

hankcs commented 5 years ago

感谢使用,这是个振奋人心的结果。我的word-embedding(其实是char-embedding)考虑了汉字的偏旁部首等构字信息,然后利用fastText的General Continuous Skip-Gram (SG) Model训练。关于这种字向量的原理,欢迎参考https://arxiv.org/pdf/1712.08841.pdf

zjuwfz commented 5 years ago

非常感谢!