Sequence Level Training with Recurrent Neural Networks

Abstract

exposure bias
- at train time, model is trained to predict next token given the previous ground-truth token, but at test time, the model predicts next token given its own predicted tokens
- the discrepancy arising from model being exposed to training data distribution only, instead of its own predictions is an exposure bias
MIXER
- trains in XENT-only fashion for first N_xent steps, then use XENT/REINFORCE loss interchangeably, eventually training the whole sequence with REINFORCE
- Mixed Incremental Cross-Entropy Reinforce : combining cross-entropy with REINFORCE in incremental learning(i.e. curriculum learning) fashion to mitigate exposure bias in cross-entropy training, and resolve exponentially large search space issue in REINFORCE training
- pseudo code
Experiment
- Machine Translation task : IWSLT14 EnDe, 153K training data, model with 256-dim LSTM
- MIXER does improve BLEU about +3.0 points

exposure bias is an important issue in MT
- there are other bias including greedy/beam search discrepancy in train/test phase
- let's look into more papers regarding the above issue
MIXER seems to be a useful, agnostic trick to improve MT results, but did not see wide usage ~ perhaps due to unstability of REINFORCE

Link : https://arxiv.org/pdf/1511.06732.pdf Authors : Ranzato et al. 2016