SYSTRAN Purely Neural MT Engines for WMT2017 - Githubissues

kweonwooj / papers

summary of ML papers I've read

318 stars 34 forks source link

SYSTRAN Purely Neural MT Engines for WMT2017 #38

Open kweonwooj opened 6 years ago

kweonwooj commented 6 years ago

Abstract

SYSTRAN’s submission to the WMT 2017 shared news translation task for English-German
Back-translation and Hyper-specialization
uses OpenNMT

screen shot 2017-11-03 at 10 47 45 pm

Details

WMT 2017 News Translation Task
- Data 4.6M Parallel corpus
Training
- Nvidia GTX 1080 ~ 64 per minibatch
- SGD (0.1) with annealing rate (0.7)
Back Translation
- translating target language back into source language and using it as parallel corpus
- Synthetically generated back-translated data 4.5M + original 4.5M after 13 epochs of original 4.5M training
- it improves performance!

screen shot 2017-11-03 at 10 50 50 pm

screen shot 2017-11-03 at 10 50 55 pm

Data Selection vis LM model
- Less data are used to fine-tune model
- data is chosen by two 3-gram LM model trained one from news corpus and one from random sampling. When the difference of cross-entropy is big, we treat it as news related sentence and include in fine-tune corpus
Hyper-specialization
- 25K news related set tuned with learning rate 0.7
- improves BLEU by +0.3~0.5

Personal Thoughts

Good to see Systran openly participating and contributing to WMT2017
Amount of data is really strong, when generated via back-translation, distillation, monolingual!
Hyper-specialization is competition-fit strategy for squeezing the performance ~ likely overfitting

Link : https://arxiv.org/pdf/1709.03814.pdf Authors : Deng et al. 2017