xcmyz / FastSpeech

The Implementation of FastSpeech based on pytorch.

MIT License

858 stars 213 forks source link

deep-learning pytorch speech-synthesis

readme

FastSpeech-Pytorch

The Implementation of FastSpeech Based on Pytorch.

Update (2020/07/20)

Optimize the training process.
Optimize the implementation of length regulator.
Use the same hyper parameter as FastSpeech2.
The measures of the 1, 2 and 3 make the training process 3 times faster than before.
Better speech quality.

Model

My Blog

Prepare Dataset

Download and extract LJSpeech dataset.
Put LJSpeech dataset in data.
Unzip alignments.zip.
Put Nvidia pretrained waveglow model in the waveglow/pretrained_model and rename as waveglow_256channels.pt;
Run python3 preprocess.py.

Training

Run python3 train.py.

Evaluation

Run python3 eval.py.

Notes

In the paper of FastSpeech, authors use pre-trained Transformer-TTS model to provide the target of alignment. I didn't have a well-trained Transformer-TTS model so I use Tacotron2 instead.
I use the same hyper-parameter as FastSpeech2.
The examples of audio are in sample.
pretrained model.

Reference

Repository

Paper