Use attention distribution to evaluate the translation quality
Use attention-filtered synthetic data added to existing parallel corpus to improve NMT translation quality in BLEU
Details
Attention-based Metrics
Coverage Deviation Penalty
aims to penalize the sum of attentions per input token for going too far from 1
Absentmindedness Penalty
dispersion of attention is measured via the entropy of the predicted attention distribution. Again, we want the penalty value to be 1.0 for the lowest entropy and head towards 0.0 for higher entropies
Training NMT with additional data
back-translation is good
but, attention-filtered synthetic data is also better
it helps especially in morphologically rich -> weak language direction
Personal Thoughts
Making data more rich and smoother by back-translation, copied-corpus, seq-level knowledge distillation and attention-filtered synthetic corpus is very strong
Abstract
Details
Attention-based Metrics
Coverage Deviation Penalty
Absentmindedness Penalty
Training NMT with additional data
Personal Thoughts
Link : https://arxiv.org/pdf/1710.03743.pdf Authors : Rikters et al 2017