propose effective method to incorporate out-of-the-box sentence pairs during NMT decoding process
use a search engine to retrieve similar sentence pairs
collect n-gram translation pieces from target side where similarity and alignment score is high
reward translation pieces during NMT decoding process
+6.0 BLEU improvement in narrow domain translation task
effective algorithm design enables accuracy, speed and simplicity of implementation
Details
Problem
NMT is weak at translating low-frequency words or phrases
Retrieval-based Model
an active research area where NMT retrieves sentence pairs from training corpus during translation
it augments parametric NMT model with a non-parametric translation memory that allows for increased capacity
Two main approaches
Li et al 2016 and Farajian et al 2017 use the retrieved sentence pairs to fine tune the parameters of the NMT model
Gu et al 2017 uses the retrieved sentence pairs as additional inputs to the NMT decoding
Contribution
existing methods perform well, but add significant complexity and computational/memory cost to the decoding process
propose a simple and efficient method that collects n-gram in the retrieved target sentences (translation pieces), calculate a pseudo-probability to weight the translation pieces and reward NMT to output translation pieces during beam search decoding process
Guiding NMT with Translation Pieces
use Lucene search engine to retrieve M source sentences that have n-gram similarity
among all n-grams in retrieved target sentences, collect translation pieces and score them according to the similarity between input sentence and retrieved source sentence:
in beam search decoding process, translation pieces are given rewards
reward process is implemented efficiently such that it does not traverse over all target vocabilary V, but only traverse target words that belong to translation pieces
Experiments
corpus : JRC-Acquis corpus. 670k sentences with narrow domain
result : +6.0 BLEU score over baseline NMT
Ablation Experiments
Effect of look-up corpus
similarity between test set and look-up corpus is an important factor in performance of Guided NMT
in WMT17 EnDe News Translation task, this method does not achieve significant improvements over the baseline due to difference in train and test data distribution, as shown in Table 8 where WMT's similarity distribution is focused on 0.2~0.4
Infrequent n-grams
Guided NMT extracts infrequent n-grams (unique count < 5) in decoded sentence than baseline NMT. This shows that algorithm meets the original motivation well.
if there is no count in look-up corpus, no improvement is seen because no reward is added
vs Search Engine Guided NMT by Gu et al 2017
our approach performs better and is faster during decoding
SGNMT requires encoding/decoding of retrieved sentence pairs, which is costly
Personal Thoughts
well written paper, well experimented, in-depth analysis
Algorithm 2 is an efficient method to reward/punish n-gram outputs in beam search
consideration on practical implementation was impressive
hope infrequent n-grams can be dealt well for general purpose translation task (WMT News task)
Abstract
Details
translation pieces
), calculate a pseudo-probability to weight the translation pieces and reward NMT to output translation pieces during beam search decoding processGuiding NMT with Translation Pieces
M
source sentences that have n-gram similarityV
, but only traverse target words that belong to translation piecesExperiments
Ablation Experiments
Effect of look-up corpus
Infrequent n-grams
vs Search Engine Guided NMT by Gu et al 2017
Personal Thoughts
Algorithm 2
is an efficient method to reward/punish n-gram outputs in beam searchLink : https://arxiv.org/pdf/1804.02559v1.pdf Authors : Zhang et al. 2018