Propose architectural design that shortens the distance between source and target words in NMT task
Analyze how alignment, over-/under-translation has improved
Details
Introduction
In NMT, source words are encoded first and weighted average (via attention) of encoded source representation is fed into decoder with previously decoded target token in order to generate new target token
Authors point out that bridging the distance between target token and source token leads to better model
Three Strategies
Source-side bridging
concatenate source embedding on final source representation
Target-side bridging
add single source embedding as direct input for decoder
Direct bridging
add training loss that minimizes the distance between target word and their most relevant source word embedding selected via attention
Results
NIST Zh-En 1.25M dataset
Bi-Directional GRU with Attention
All bridgings improve BLEU score, but SMT is very competitive with RNNSearch in this dataset
Analysis
Alignment
percentage of target eos attending to source eos is much higher with direct bridging
POS-based alignment improves slightly
Alignment Error Rate (AER) improves
Over-translation
ROT (rate of over-translation) metric proposed by [Li et al 2017]() is computed as all POSs in source part divided by number of words in word set
ROT decreased by 15% in direct bridging model
Under-translation
1-gram BLEU as proxy for under-translation. 1-gram BLEU improves with direct bridging, but is lower than SMT
Personal Thoughts
Idea of reducing maximum path length between target token and source token seems valid
in-depth analysis of alignment and over-/under-translation was impressive
it's notoriously hard to tackle over-/under-translation
Abstract
Details
Introduction
Three Strategies
Results
Analysis
eos
attending to sourceeos
is much higher with direct bridgingPersonal Thoughts
Link : https://arxiv.org/pdf/1711.05380v4.pdf Authors : Kuang et al. 2018