This is an unofficial PyTorch implementation of AdaSpeech. AdaSpeech: Adaptive text to speech for custom voice.
This project is based on ming024's implementation of FastSpeech 2.
Utterance level encoder
and Phoneme level encoder
to improve acoustic generalizationConditional layer norm
which is the soul of AdaSpeech papernvcc --version
pip install -r requirements.txt
run the preprocessing script
python preprocess.py config/pretrain/preprocess.yaml
Train baseline model with
python train.py [-h] [-p PREPROCESS_CONFIG_PATH] [-m MODEL_CONFIG_PATH] [-t TRAIN_CONFIG_PATH] [--vocoder_checkpoint VOCODER_CHECKPOINT_PATH] [--vocoder_config VOCODER_CONFIG_PATH]
First, align the corpus by using MFA tool to get TextGrid (note that only finetune 1 speaker for best quality)
run the preprocessing script
python preprocess.py config/finetune/preprocess.yaml
Finetune speaker voice with
python finetune.py [-h] [--pretrain_dir BASE_LINE_MODEL_PATH] [-p PREPROCESS_CONFIG_PATH] [-m MODEL_CONFIG_PATH] [-t TRAIN_CONFIG_PATH] [--vocoder_checkpoint VOCODER_CHECKPOINT_PATH] [--vocoder_config VOCODER_CONFIG_PATH]
Use
tensorboard [--logdir LOG_PATH]
Tensorboard for pretrain model
Tensorboard for finetune with only 5 sentences
@misc{chen2021adaspeech,
title={AdaSpeech: Adaptive Text to Speech for Custom Voice},
author={Mingjian Chen and Xu Tan and Bohan Li and Yanqing Liu and Tao Qin and Sheng Zhao and Tie-Yan Liu},
year={2021},
eprint={2103.00993},
archivePrefix={arXiv},
primaryClass={eess.AS}
}