rishikksh20 / gmvae_tacotron

Gaussian Mixture VAE Tacotron
MIT License
53 stars 12 forks source link
tacotron tacotron-2 tts

GMVAE Tacotron-2:

Tensorflow Unofficial Implementation of HIERARCHICAL GENERATIVE MODELING FOR CONTROLLABLE SPEECH SYNTHESIS

Repository Structure:

Tacotron-2
├── datasets
├── LJSpeech-1.1    (0)
│   └── wavs
├── logs-Tacotron   (2)
│   ├── mel-spectrograms
│   ├── plots
│   ├── pretrained
│   └── wavs
├── papers
├── tacotron
│   ├── models
│   └── utils
├── tacotron_output (3)
│   ├── eval
│   ├── gta
│   ├── logs-eval
│   │   ├── plots
│   │   └── wavs
│   └── natural
└── training_data   (1)
    ├── audio
    └── mels

The previous tree shows what the current state of the repository.

Requirements

first, you need to have python 3.5 installed along with Tensorflow v1.6.

next you can install the requirements :

pip install -r requirements.txt

else:

pip3 install -r requirements.txt

Dataset:

This repo tested on the ljspeech dataset, which has almost 24 hours of labeled single actress voice recording.

Preprocessing

Before running the following steps, please make sure you are inside Tacotron-2 folder

cd Tacotron-2

Preprocessing can then be started using:

python preprocess.py

or

python3 preprocess.py

dataset can be chosen using the --dataset argument. Default is Ljspeech.

Training:

Feature prediction model can be trained using:

python train.py --model='Tacotron'

or

python3 train.py --model='Tacotron'

Synthesis

There are three types of mel spectrograms synthesis for the Spectrogram prediction network (Tacotron):

python synthesize.py --model='Tacotron' --mode='eval' --reference_audio='ref_1.wav'

or

python3 synthesize.py --model='Tacotron' --mode='eval' --reference_audio='ref_1.wav'

Note:

Pretrained model and Samples:

TODO

References and Resources:

Work in progress