Guiding Neural Machine Translation with Retrieved Translation Pieces

Abstract

propose effective method to incorporate out-of-the-box sentence pairs during NMT decoding process
- use a search engine to retrieve similar sentence pairs
- collect n-gram translation pieces from target side where similarity and alignment score is high
- reward translation pieces during NMT decoding process
+6.0 BLEU improvement in narrow domain translation task
effective algorithm design enables accuracy, speed and simplicity of implementation

Problem
- NMT is weak at translating low-frequency words or phrases
Retrieval-based Model
- an active research area where NMT retrieves sentence pairs from training corpus during translation
- it augments parametric NMT model with a non-parametric translation memory that allows for increased capacity
- Two main approaches
- Li et al 2016 and Farajian et al 2017 use the retrieved sentence pairs to fine tune the parameters of the NMT model
- Gu et al 2017 uses the retrieved sentence pairs as additional inputs to the NMT decoding
Contribution
- existing methods perform well, but add significant complexity and computational/memory cost to the decoding process
- propose a simple and efficient method that collects n-gram in the retrieved target sentences (translation pieces), calculate a pseudo-probability to weight the translation pieces and reward NMT to output translation pieces during beam search decoding process

screen shot 2018-04-17 at 6 39 12 pm

use Lucene search engine to retrieve M source sentences that have n-gram similarity
among all n-grams in retrieved target sentences, collect translation pieces and score them according to the similarity between input sentence and retrieved source sentence:
in beam search decoding process, translation pieces are given rewards
reward process is implemented efficiently such that it does not traverse over all target vocabilary V, but only traverse target words that belong to translation pieces

Effect of look-up corpus
- similarity between test set and look-up corpus is an important factor in performance of Guided NMT
- in WMT17 EnDe News Translation task, this method does not achieve significant improvements over the baseline due to difference in train and test data distribution, as shown in Table 8 where WMT's similarity distribution is focused on 0.2~0.4
Infrequent n-grams
- Guided NMT extracts infrequent n-grams (unique count < 5) in decoded sentence than baseline NMT. This shows that algorithm meets the original motivation well.
- if there is no count in look-up corpus, no improvement is seen because no reward is added
vs Search Engine Guided NMT by Gu et al 2017
- our approach performs better and is faster during decoding
- SGNMT requires encoding/decoding of retrieved sentence pairs, which is costly

well written paper, well experimented, in-depth analysis
Algorithm 2 is an efficient method to reward/punish n-gram outputs in beam search
consideration on practical implementation was impressive
hope infrequent n-grams can be dealt well for general purpose translation task (WMT News task)