Propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts
Experimental result on simulated low-resource setting for En-De/De-En shows ~3.0 BLEU improvement over back-translation
Details
Translation Data Augmentation
Switching a word in both source and target sentences
Improper word switching is eliminated by Language Model
Word to be switched is chosen via LM
Location of word to be switched is chosen via automatic word alignments trained over the bitext (fastAlign)
Result
Better BLEU score than back-translation, but margin is not significantly different
Personal Thoughts
You need LM and aligner to augment data
augmentation focuses on rare words only, no diversity in sentence/semantics supported
Abstract
Details
Translation Data Augmentation
Result
Personal Thoughts
Link : https://arxiv.org/pdf/1705.00440.pdf Authors : Fadaee et al. 2017