Open lgit2017 opened 6 years ago
@lgit2017 this is not done inside of the BasicDecoder
, right?
I tried, but it failed to learn an alignment (after running for about 30k steps). It's on my TODO list to figure out why, but I haven't had much time to work on this repo lately. If have time to give it a shot, please let me know how it goes!
@keithito what data are you feeding into the DecoderRNN?
For Tacotron 2 they use the attention context and decoder's previous output
, not the attention's previous output. I think you're doing the latter...
https://github.com/keithito/tacotron/blob/tacotron2-work-in-progress/models/tacotron.py#L65
@keithito In Tacotron2 paper https://arxiv.org/abs/1712.05884, the authors mention that "The concatenation of the LSTM output and the attention context vector is then projected through a linear transform to produce a prediction of the target spectrogram frame.". Was there a reason you did not concatenate the attention context vector with the last LSTM output?