MetaMind's submission to WMT16 EnDe news domain translation task
MetaMind is DeppLearning lab for Salesforce where Richard Socher (ImageNet, GloVe, Dynamic Memory Network, QRNN, Weighted Transformer, Non-Autoregressive NMT) leads
Details
Model Descriptions
Standard LSTM model
Y-LSTM Model (what is the tracker doing? What does it mean?)
Encoder : 5-layer stacked LSTM RNN-LM with subword-vector inputs, whose top-most outputs state is used as input to a softmax layer which predicts the next input token.
Y : middle (l=3) layer of the encoder is connected recurrently to single-layer LSTM called the 'tracker'
Decoder : RNN-LM with tracker LSTM, identical except that the hidden and memory states of the decoder's tracker are replaced at each timestep with an attentional sum of the encoder's saved tracker states
primary contribution of this model is to demonstrate that purely attentional NMT is possible
Result
2nd place overall
Y-LSTM was lower than standard-single, but gave boost in ensemble
Personal Thoughts
Maybe the Y-LSTM was very similar to transformer, fully attentional NMT
Abstract
Details
Model Descriptions
Result
Personal Thoughts
Link : https://aclweb.org/anthology/W/W16/W16-2308.pdf Authors : Bradbury et al. 2017