Open nnop opened 7 years ago
网络深度
In most cases, however, performance improvements of making the model deeper than 2 layers are minimal (Reimers & Gurevych, 2017). These observations hold for most sequence tagging and structured prediction problems. For classification, deep or very deep models perform well only with character-level input and shallow word-level models are still the state-of-the-art (Zhang et al., 2015; Conneau et al., 2016; Le et al., 2017)
优化
Adam (Kingma & Ba, 2015) is one of the most popular and widely used optimization algorithms and often the go-to optimizer for NLP researchers. It is often thought that Adam clearly outperforms vanilla stochastic gradient descent (SGD). However, while it converges much faster than SGD, it has been observed that SGD with learning rate annealing slightly outperforms Adam (Wu et al., 2016). Recent work furthermore shows that SGD with properly tuned momentum outperforms Adam (Zhang et al., 2017)
预处理
u'([\u4E00-\u9FA5a-zA-Z0-9+_]+)'
去掉特殊字符和标点(注意因为是unicode范围,输入word需要decode(‘utf8’));text classification
codes
发现了一个新思路
researcher
papers