Apply key modeling and training techniques from Transformer to RNN, yielding a new RNMT+ model which outperforms existing RNMT, ConvS2S and Transformer
Hybrid models obtain further improvements
Details
Introduction
NMT model has improved from RNMT, CNN and now to Transformer
RNMT
Google-NMT system is based on RNMT
strong in its sequential nature with potentially infinite memory
CNN
Faster training and inference compared to RNMT with stacked CNN enc-dec
requires meticulous design of gradient scaling for stable training
Transformer
current SoTA in NMT
faster in training and inference, but decoder has no memory
Abstract
RNMT+
model which outperforms existing RNMT, ConvS2S and TransformerDetails
Introduction
normalize > transform > dropout > residual-add
sequenceRNMT+
Results
Ablation Experiments
Encoder-Decoder hybrid
Personal Thoughts
Link : https://arxiv.org/pdf/1804.09849v1.pdf Authors : Chen et al. 2018