This is the official code repository for the paper "OverFlow: Putting flows on top of neural transducers for better TTS". For audio examples, visit our demo page. pre-trained model (female) and pre-trained model (male) are also available.
OverFlow is now also available in Coqui TTS! Making it easier for people to use and experiment with OverFlow please find the training recipe under
recipes/ljspeech/overflow
rolling out more recipes soon!# Install TTS pip install tts # Change --text to the desired text prompt # Change --out_path to the desired output path tts --text "Hello world!" --model_name tts_models/en/ljspeech/overflow --vocoder_name vocoder_models/en/ljspeech/hifigan_v2 --out_path output.wav
Current plan is to maintain both the repositories.
data
folder such that the directory becomes data/LJSpeech-1.1
. Otherwise update the filelists in data/filelists
accordingly.git clone https://github.com/shivammehta25/OverFlow.git
src/hparams.gradient_checkpoint=False
git submodule init; git submodule update
pip install -r requirements.txt
bash start.sh
and it will install all the dependencies and run the container.src/hparams.py
for hyperparameters and set GPUs.
[0, 1 ..]
[]
python generate_data_properties.py
to generate data_parameters.pt
for your dataset (the default data_parameters.pt
is available for LJSpeech in the repository).python train.py
to train the model.
hparams.checkpoint_dir
.hparams.tensorboard_log_dir
.python train.py -c <CHECKPOINT_PATH>
synthesis.ipynb
or use the overflow_speak.py
file.python overflow_speak.py -t "Hello world" --checkpoint_path <CHECKPOINT_PATH> --hifigan_checkpoint_path <HIFIGAN_PATH> --hifigan_config <HIFIGAN_CONFIG_PATH>
python overflow_speak.py -f <FILENAME> --checkpoint_path <CHECKPOINT_PATH> --hifigan_checkpoint_path <HIFIGAN_PATH> --hifigan_config <HIFIGAN_CONFIG_PATH>
src.hparams.py
change hparams.precision
to 16
for mixed precision and 32
for full precision.
hparams.gpus
to [0, 1, 2]
for multi-GPU training and single element [0]
for single-GPU training.ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/data.py)
torch==1.11.0a0+b6df043
--extra-index-url https://download.pytorch.org/whl/cu113
torchmetrics==0.6.0
If you have any questions or comments, please open an issue on our GitHub repository.
If you use or build on our method or code for your research, please cite our paper:
@inproceedings{mehta2023overflow,
title={{O}ver{F}low: {P}utting flows on top of neural transducers for better {TTS}},
author={Mehta, Shivam and Kirkland, Ambika and Lameris, Harm and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
booktitle={Proc. Interspeech},
pages={4279--4283},
doi={10.21437/Interspeech.2023-1996},
year={2023}
}
The code implementation is based on Nvidia's implementation of Tacotron 2, Glow TTS and uses PyTorch Lightning for boilerplate-free code.