ictnlp / OR-NMT

Source Code for ACL2019 paper <Bridging the Gap between Training and Inference for Neural Machine Translation>
42 stars 10 forks source link

关于Oracle Word Selection #8

Open goodluck110706112 opened 4 years ago

goodluck110706112 commented 4 years ago

我按照论文思路,在lstm上使用了Oracle Word Selection+schedule sample,但是发现效果提升很微弱,我估计是我哪里设置的不对吧。 有一个问题,关于Oracle Word Selection

在论文公式11中,我直接选择了argmax oj-1作为最终的oracle word,没有经过softmax,我这样做的原因是argmax oj-1其实就是argmax Pj-1,也就是不需要经过softmax就可以得到oracle word,为什么这里要加公式12,也就是softmax呢?

zhang-wen commented 4 years ago

Reply to: 我按照论文思路,在lstm上使用了Oracle Word Selection+schedule sample,但是发现效果提升很微弱,我估计是我哪里设置的不对吧。 Yes, you need to carefully select the hyperparameter k according to different model architectures and different datasets. I can't find the parameters of the rnn-based model now, sorry about that. Please refer to the selection of hyperparameters on the attention-based model in the Readme.

Reply to: 有一个问题,关于Oracle Word Selection 在论文公式11中,我直接选择了argmax oj-1作为最终的oracle word,没有经过softmax,我这样做的原因是argmax oj-1其实就是argmax Pj-1,也就是不需要经过softmax就可以得到oracle word,为什么这里要加公式12,也就是softmax呢? Yes, you are right, argmax(o_{j-1}) is the same as argmax(P_{j-1}). In fact, the softmax operation is not needed in the code implementation.