soobinseo / Transformer-TTS

A Pytorch Implementation of "Neural Speech Synthesis with Transformer Network"
MIT License
661 stars 141 forks source link
attention-mechanism attention-plots deep-learning pytorch pytorch-implementation text-to-speech transformer tts

Transformer-TTS

Requirements

Data

Pretrained Model

Attention plots

Self Attention encoder

Self Attention decoder

Attention encoder-decoder

Learning curves & Alphas

Experimental notes

  1. The learning rate is an important parameter for training. With initial learning rate of 0.001 and exponentially decaying doesn't work.
  2. The gradient clipping is also an important parameter for training. I clipped the gradient with norm value 1.
  3. With the stop token loss, the model did not training.
  4. It was very important to concatenate the input and context vectors in the Attention mechanism.

Generated Samples

File description

Training the network

Generate TTS wav file

Reference

Comments