Helsinki NMT ranked 1st inWMT2017 News Translation task in English-Finnish
Details
Arsenals
Layer Normalization : preliminary experiments showed no improvement
Variational Dropout : dropout in recurrent states
Context Gates : achieved better cross-entropy, but no improvement in BLEU or chrF3
Coverage Decoder : preliminary experiments showed no improvement
Ensemble : Proper ensemble is best, but Parameter Averaging also helps
Experiments
Choice of Segmentation Strategy
BPE in decoder performs well, char-level decoder performs high in chrF3
Ensemble
Proper ensemble is best, but parameter averaging helps
In dev set, they found lots of contractions (wouldn't etc) which were not present in training set, so they de-tokenized them
Personal Thoughts
Lots of ideas, tested in preliminary baseline and applied effective ones in large scale data
Language specific tunings such as dev set tuning and exhaustive search in enc/dec segmentation strategy, which lead to first place in English-Finnish task
Abstract
Details
Arsenals
Experiments
Personal Thoughts
Link : https://arxiv.org/pdf/1708.05942.pdf Authors : Ostling et al. 2017