tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.39k stars 1.96k forks source link

Beam Search Optimization #225

Open Edouard360 opened 6 years ago

Edouard360 commented 6 years ago

After experimentation on a nmt-realted tensorflow project - https://github.com/ssampang/im2latex - I realized that my greedy embedding decoder using the GreedyEmbeddingHelper performed better than my BeamSearchDecoder.

According to this paper, this is probably due to my training strategy that doesn't account for Exposure Bias: the model is never exposed to its own errors during training. Indeed, I am using the seq2seq TrainingHelper that only trains one step ahead, and the gold standard is used every time to predict the next sequence output.

Instead the paper claims significant improvements from a beam search optimized training strategy. Do you know if using the ScheduledEmbeddingTrainingHelper would suffice to account for this Exposure Bias ? Because this looks really close to what is done in the paper, except that only one trajectory would be considered here.