Bag-of-Words as Target for Neural Machine Translation - Githubissues

kweonwooj / papers

summary of ML papers I've read

319 stars 34 forks source link

Bag-of-Words as Target for Neural Machine Translation #109

Open kweonwooj opened 6 years ago

kweonwooj commented 6 years ago

Abstract

Propose a new training approach that uses both sentence and bag-of-words as targets in training stage
Addition of bag-of-words encourages model to generate potentially correct sentences, instead of punishing all non-reference sentences as incorrect
in NIST Zh-En set, BLEU improves 4.55

Details

Introduction

Problem
- NMT training considers target sentence as the only golden label, and other semantically close or syntactically close sentences as incorrect equally with completely incorrect sentences.
Solution
- add bag-of-words loss that encourages models to receive rewards when correct tokens are selected even with incorrect positions

Overview

screen shot 2018-05-30 at 2 18 38 pm

bag-of-words loss is calculated by summing all softmax results in decoder and compare them with reference bag-of-words in MLE format

Result

NIST Zh-En dataset has 1.25M, which is quite small for NMT set
bag-of-words model improves model by +4.55 BLEU, but notice SMT Moses achieving almost equivalent performance as Seq2Seq+Att.

Personal Thoughts

Good problem definition and solution is both intuitive and straighforward
NMT usually performs much better than SMT, but in this dataset, performance of SMT Moses and Seq2Seq+Att baseline is too close. Then what is bag-of-words improving?

Link : https://arxiv.org/pdf/1805.04871v1.pdf Authors : Ma et al. 2018