Use encoder-decoder with Bahdanau attention to predict MNIST digits sequence
Figure 1. Encoder-decoder model with Bahdanau attention
Figure 2. Generated digits sequence based on MNIST
I have tested the model with and without Bahdanau attention, the results are as below. Apparently the attention mechanism improves the prediction, but there is another thing I need to note here. The difference between Figure 4 and Figure 5 is whether using the true label as decoder input or using the predicted label generated by itself at previous timestep. In the figures, we can tell the latter one performs better. Actually, this tricky way is a simplified method related to Scheduled Sampling[1], using which I think the result would be even better.
Figure 3. CER and LOSS of basic encoder-decoder model
Figure 4. CER and LOSS of encoder-decoder model with Bahdanau attention
Figure 5. CER and LOSS of encoder-decoder model with Bahdanau attention using self-predicted value as decoder input
[1] Bengio, Samy, et al. "Scheduled sampling for sequence prediction with recurrent neural networks." Advances in Neural Information Processing Systems. 2015.