Fully Unsupervised NMT using Monolingual Corpora only by FAIR
De-noising Auto-encoder + Language specific Decoder + Language Discriminator
Good paper on ICLR 2018
Enables better NMT for low-resource language pairs
Performance is still blocks below supervised NMT
Details
Key Idea
Build a common latent space between the two languages
Learn to translate by reconstructing in both domains according to two principles
(i) the model has to be able to reconstruct a sentence in a given language from a noisy version of it, as in standard de-noising auto-encoders
(ii) The model also learns to reconstruct any source sentence given a noisy translation of the same sentence in the target domain, and vice versa
Learning Objective
De-noising Auto-Encoder : Embed sentence into latent space with noise and reconstruct it back
Cross-Domain : Minimizing loss for (Source in lang1 -> Latent Space -> Reconstructed Target in lang2 -> Back into Latent Space -> Reconstruct Source in lang1 )
Adversarial : Discriminator tries to identify the language by seeing the embedding in latent space, the model tries to fool by mapping same semantic sentences into same latent space in language independent manner
Final Objective Function
Training
Model starts with an unsupervised word-by-word translation in an unsupervised way
Encoder tries to map the source sentence with noise into shared latent space, and reconstruct as in de-noising auto-encoder.
Decoder learns to reconstruct the input from the latent space, given a language flag
Discriminator tries to identify the source language in an adversarial setting
Model Selection
BLEU score for two-way translation is used as a evaluation metric
shows good correlation with classic BLEU
Results
Not sure the baselines were really meaningful
Unsupervised does learn something!
Monolingual vs Parallel Corpus
10M Monolingual ~ 100K Parallel Corpora
Ablation Study
dropping subset of training scheme to see which part is critical in learning
De-noising Auto-Encoder and Cross-Domain are both critical
Personal Thoughts
Great work of Unsupervised NMT
Better than Cho's paper because it is fully differentiable
Abstract
Details
Key Idea
Learning Objective
Training
Model Selection
Results
Monolingual vs Parallel Corpus
Ablation Study
Personal Thoughts
Link : https://arxiv.org/pdf/1711.00043.pdf Authors : Lample et al. 2017