MetaMind Neural Machine Translation System for WMT 2016

Abstract

MetaMind's submission to WMT16 EnDe news domain translation task
MetaMind is DeppLearning lab for Salesforce where Richard Socher (ImageNet, GloVe, Dynamic Memory Network, QRNN, Weighted Transformer, Non-Autoregressive NMT) leads

Model Descriptions
- Standard LSTM model
- Y-LSTM Model (what is the tracker doing? What does it mean?)
- Encoder : 5-layer stacked LSTM RNN-LM with subword-vector inputs, whose top-most outputs state is used as input to a softmax layer which predicts the next input token.
- Y : middle (l=3) layer of the encoder is connected recurrently to single-layer LSTM called the 'tracker'
- Decoder : RNN-LM with tracker LSTM, identical except that the hidden and memory states of the decoder's tracker are replaced at each timestep with an attentional sum of the encoder's saved tracker states
- primary contribution of this model is to demonstrate that purely attentional NMT is possible
Result
- 2nd place overall
- Y-LSTM was lower than standard-single, but gave boost in ensemble