Unsupervised Machine Translation using Monolingual Corpora Only

Abstract

Fully Unsupervised NMT using Monolingual Corpora only by FAIR
De-noising Auto-encoder + Language specific Decoder + Language Discriminator
Good paper on ICLR 2018
Enables better NMT for low-resource language pairs
Performance is still blocks below supervised NMT

Details

Key Idea
- Build a common latent space between the two languages
- Learn to translate by reconstructing in both domains according to two principles
- (i) the model has to be able to reconstruct a sentence in a given language from a noisy version of it, as in standard de-noising auto-encoders
- (ii) The model also learns to reconstruct any source sentence given a noisy translation of the same sentence in the target domain, and vice versa
Learning Objective
- De-noising Auto-Encoder : Embed sentence into latent space with noise and reconstruct it back
- Cross-Domain : Minimizing loss for (Source in lang1 -> Latent Space -> Reconstructed Target in lang2 -> Back into Latent Space -> Reconstruct Source in lang1 )
- Adversarial : Discriminator tries to identify the language by seeing the embedding in latent space, the model tries to fool by mapping same semantic sentences into same latent space in language independent manner
- Final Objective Function
Training
- Model starts with an unsupervised word-by-word translation in an unsupervised way
- Encoder tries to map the source sentence with noise into shared latent space, and reconstruct as in de-noising auto-encoder.
- Decoder learns to reconstruct the input from the latent space, given a language flag
- Discriminator tries to identify the source language in an adversarial setting
Model Selection
- BLEU score for two-way translation is used as a evaluation metric
- shows good correlation with classic BLEU
Results
- Not sure the baselines were really meaningful
- Unsupervised does learn something!
Monolingual vs Parallel Corpus
- 10M Monolingual ~ 100K Parallel Corpora
Ablation Study
- dropping subset of training scheme to see which part is critical in learning
- De-noising Auto-Encoder and Cross-Domain are both critical