Open howardyclo opened 6 years ago
This paper purposes to initialize the weights of encoder and decoder in neural sequence-to-sequence (Seq2Seq) model with two pre-trained language models, and then fine-tuned with labeled data. During the fine-tuning phase, they jointly train the Seq2Seq objective with the language modeling (LM) objectives to prevent overfitting. The experiment results show that their method achieves an improvement of 1.3 BLEU from the previous best models on both WMT'14 and WMT'15 English->German. Human evaluations on abstractive summarization also shows that their method outperforms a purely supervised learning baseline. (See more important insights on ablation studies)
This paper is also similar to #3, #5 and #6.
After the Seq2Seq is initialized with pre-trained language models (see Figure 1), the Seq2Seq is then fine-tuned with both the translation objective and the language modeling objective (multi-task learning), to avoid catastrophic forgetting (GoodFellow et al. 2013).
Results on machine translation task:
Figure 3 shows important Insights on ablation studies on machine translation.
Figure 4 shows that the unsupervisedly pre-trained models degrade less than the models without unsupervised pre-training as the labeled dataset becomes smaller.
Results on abstractive summarization task:
The unsupervisedly pre-trained model is only able to match the baseline model, which uses a word2vec pre-trained embeddings + bi-LSTM encoder. Don't forget that this paper just uses an uni-directional LSTM as encoder in Seq2Seq. This result shows that just pre-training the embeddings itself gives a large improvement
Figure 5 shows important Insights on ablation studies on abstractive summarization.
Finally, human evaluation shows that not only the better generalization the unsupervised pre-training method can achieve, but also the decrease of unwanted repetition in abstractive summarization.
Maybe we can use a bi-LSTM to pre-train a bidirectional language modeling instead of using only an uni-directional pre-trained LSTM purposed in this paper.
Metadata
Authors: Prajit Ramachandran, Peter J. Liu and Quoc V. Le Organization: Google Brain Conference: EMNLP 2017 Link: https://goo.gl/n2cKG9