qichaotang97 / word2vec_src_reading

Commented (but unaltered) version of original word2vec C implementation.
Apache License 2.0
1 stars 0 forks source link

word2vec源码阅读笔记 #1

Open qichaotang opened 5 years ago

qichaotang commented 5 years ago

cbow 和 skip-gram 两种方式从代码角度来看,基本求解方式一致 cbow: 窗口内词取平均来预测目标词 skip-gram: 一个词循环预测多个目标词(窗口词) 区别:cbow比sg训练快,sg比cbow更好地处理生僻字(出现频率低的字) 优化算法: hierarchical softmax(优化每个非叶子节点,树上多个二分类) 和 negative sample (二分类) negative sample :按照频次采样,对于频次取3/4指数,即使用基于词频的权重分布来获得概率分布进行抽样,相当于对高频词降低更多,低频词降低更少,增加低频词出现概率 “平滑”策略

tensorflow 版本 word2vec 实现: 可以将 在 softmax 结构中,一般是 softmax(Wx+b)这样的形式,考虑到 W矩阵的形状事实上跟词向量矩阵的形状是一样的,因此本文考虑了 softmax 层与词向量层共享权重W的模型

代码吐槽:风格比较乱,两种方法可以进行抽象,减少代码行数,有时间可以使用 C++11 重构下

qichaotang commented 5 years ago

word2vec词向量源码阅读学习笔记(https://blog.csdn.net/hjimce/article/details/51564783

qichaotang commented 5 years ago

word2vec更细的源码解读(https://blog.csdn.net/leiting_imecas/article/details/72303044

qichaotang commented 5 years ago

word2vec huffman树过程解读(https://blog.csdn.net/lingerlanlan/article/details/38048335

qichaotang commented 5 years ago

word2vec源码和公式解读(http://www.hankcs.com/nlp/word2vec.html) (https://daiwk.github.io/posts/nlp-word2vec.html

qichaotang commented 5 years ago

word2vec更细的讲解源码方法(http://www.cnblogs.com/neopenx/p/4571996.html

qichaotang commented 5 years ago

word2vec 原理详解(http://www.cnblogs.com/peghoty/p/3857839.html

qichaotang commented 5 years ago

word2vec 源码流程解析(https://blog.csdn.net/google19890102/article/details/51887344) (https://github.com/zhaozhiyong19890102/OpenSourceReading/blob/master/word2vec/word2vec.c

qichaotang commented 5 years ago

word2vec (http://mccormickml.com/2016/04/27/word2vec-resources/

qichaotang commented 5 years ago

word2vec C++11 实现(https://github.com/maxoodf/word2vec

qichaotang commented 5 years ago

TODO:补充下算法loss和优化方法的loss,公式推导 参考:https://zhuanlan.zhihu.com/p/58425003 参考:word2vec Parameter Learning Explained: https://arxiv.org/pdf/1411.2738.pdf

qichaotang commented 5 years ago

word2vec面试题:https://blog.csdn.net/zhangxb35/article/details/74716245 n问word2vec: https://zhuanlan.zhihu.com/p/43214781

qichaotang commented 5 years ago

[NLP] 秒懂词向量Word2vec的本质:(https://zhuanlan.zhihu.com/p/26306795

qichaotang commented 5 years ago

Word2vec数学原理全家桶(http://shomy.top/2017/07/28/word2vec-all/