Propose a new training approach that uses both sentence and bag-of-words as targets in training stage
Addition of bag-of-words encourages model to generate potentially correct sentences, instead of punishing all non-reference sentences as incorrect
in NIST Zh-En set, BLEU improves 4.55
Details
Introduction
Problem
NMT training considers target sentence as the only golden label, and other semantically close or syntactically close sentences as incorrect equally with completely incorrect sentences.
Solution
add bag-of-words loss that encourages models to receive rewards when correct tokens are selected even with incorrect positions
Overview
bag-of-words loss is calculated by summing all softmax results in decoder and compare them with reference bag-of-words in MLE format
Result
NIST Zh-En dataset has 1.25M, which is quite small for NMT set
bag-of-words model improves model by +4.55 BLEU, but notice SMT Moses achieving almost equivalent performance as Seq2Seq+Att.
Personal Thoughts
Good problem definition and solution is both intuitive and straighforward
NMT usually performs much better than SMT, but in this dataset, performance of SMT Moses and Seq2Seq+Att baseline is too close. Then what is bag-of-words improving?
Abstract
Details
Introduction
Overview
bag-of-words
loss is calculated by summing all softmax results in decoder and compare them with reference bag-of-words in MLE formatResult
Personal Thoughts
Link : https://arxiv.org/pdf/1805.04871v1.pdf Authors : Ma et al. 2018