Question about ConcatOutputAndAttentionWrapper

Hi !

Thank for sharing. I have a question regarding the concatenation step for the input of the RNN Decoder described as "the context vector and the attention RNN cell output" in Tacotron's paper. From my understanding, the context vector is used twice :

one for the Attention RNN
one for the RNN Decoder see https://pythonawesome.com/content/images/2018/09/Tacotron-pytorch.jpg

In the code, for the decoder RNN, the context vector is the "res_state.attention" variable and the "output" variable is the attention RNN cell output, if I am not wrong. I do not understand why, in the case of the context vector we do not use "state.attention" instead of "res_state.attention". I do not get why we are using the context vector of the next timestep instead of using the context vector of the current timestep ?

The code I am talking about : https://github.com/keithito/tacotron/blob/master/models/rnn_wrappers.py

keithito / tacotron

Question about ConcatOutputAndAttentionWrapper #244