Closed artpli closed 4 years ago
补充:
补充:
增加特征多样性指的是,将低层次的特征和高层次的特征合并,书中这里只是援引跳层连接的出处。在后面的一些工作也借鉴了这种做法,例如: Enhanced LSTM for Natural Language Inference和Dissecting Contextual Word Embeddings: Architecture and Representation。
另外,不能说在加了跳层连接后其结果并没有变好,原文中提到:
The results do not allow to say whether the direct connections from input to output are useful or not, but suggest that on a smaller corpus at least, better generalization can be obtained without the direct input-to-output connections, at the cost of longer training: without direct connections the network took twice as much time to converge (20 epochs instead of 10), albeit to a slightly lower perplexity.
跳层连接收敛比无跳层连接的模型快一倍,且结果差异不很显著。
公式15.27的跳层连接,t时刻的输入词向量e_t与g(·)输出一一对应,应该是不需要修改。
咋关掉了😳,其他的几条建议应该没有问题呀。可以reopen吗?
咋关掉了😳,其他的几条建议应该没有问题呀。可以reopen吗?
我在查一些资料,想再确定一下再开:)
- [ ] ML表示最大似然估计没有什么问题,因为估计值本身被称为estimate,而且上下文指示也比较明确了;
- [ ] 关于跳层连接:
增加特征多样性指的是,将低层次的特征和高层次的特征合并,书中这里只是援引跳层连接的出处。在后面的一些工作也借鉴了这种做法,例如: Enhanced LSTM for Natural Language Inference和Dissecting Contextual Word Embeddings: Architecture and Representation。
另外,不能说在加了跳层连接后其结果并没有变好,原文中提到:
The results do not allow to say whether the direct connections from input to output are useful or not, but suggest that on a smaller corpus at least, better generalization can be obtained without the direct input-to-output connections, at the cost of longer training: without direct connections the network took twice as much time to converge (20 epochs instead of 10), albeit to a slightly lower perplexity.
跳层连接收敛比无跳层连接的模型快一倍,且结果差异不很显著。
公式15.27的跳层连接,t时刻的输入词向量e_t与g(·)输出一一对应,应该是不需要修改。
谢谢,已修正。
上述问题关于”跳层连接“的部分,学生没有再多查阅别的资料,只就Bengio的论文来看有上述提议,老师可能还有其他思考,还请指教。