Open kweonwooj opened 5 years ago
hints
fertilities
incoherent phrases and miss meaningful tokens on the source side
Hints from hidden state
Hints from word alignment
Initial Decoder State
z
Multihead Positional Attention
Inference Tricks
Length Prediction
C
Length Range Prediction
ART re-scoring
bad luck
Link : https://openreview.net/pdf?id=r1gGpjActQ Authors : Li et al 2018
Abstract
hints
from pre-trained AutoRegressive Translation (ART) model to train Non-AutoRegressive Translation (NART) modelhints
from hidden statehints
from word alignmentDetails
Introduction
fertilities
from SMT model and copies source tokens to initialize decoder stateshints
from pre-trained ART modelMotivation
incoherent phrases and miss meaningful tokens on the source side
Hint-based NMT
Hints from hidden state
Hints from word alignment
Initial Decoder State
(z
) : linear combination of source embeddingMultihead Positional Attention
: additional sub-layer in decoder to re-configure the positionsInference Tricks
Length Prediction
: instead of predicting target length, use constant biasC
obtained from train corpus (no computational overhead)Length Range Prediction
: instead of predicting a fixed length, predict over a range of target lengthART re-scoring
: use ART model to re-score multiple target candidates, to select the final one (rescoring can take place in non-autoregressive manner)Overall Performance
Personal Thoughts
Inference Tricks
seem to be a strong contribution that authors do not explicitly point outbad luck
Link : https://openreview.net/pdf?id=r1gGpjActQ Authors : Li et al 2018