keithito / tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
MIT License
2.96k stars 957 forks source link

Question about ConcatOutputAndAttentionWrapper #244

Open Shiro-LK opened 5 years ago

Shiro-LK commented 5 years ago

Hi !

Thank for sharing. I have a question regarding the concatenation step for the input of the RNN Decoder described as "the context vector and the attention RNN cell output" in Tacotron's paper. From my understanding, the context vector is used twice :

In the code, for the decoder RNN, the context vector is the "res_state.attention" variable and the "output" variable is the attention RNN cell output, if I am not wrong. I do not understand why, in the case of the context vector we do not use "state.attention" instead of "res_state.attention". I do not get why we are using the context vector of the next timestep instead of using the context vector of the current timestep ?

The code I am talking about : https://github.com/keithito/tacotron/blob/master/models/rnn_wrappers.py