tuanh123789 / AdaSpeech

An implementation of Microsoft's "AdaSpeech: Adaptive Text to Speech for Custom Voice"
95 stars 27 forks source link
adaspeech conditional-layer-norm conditional-layer-normalization fastspeech2 text-to-speech voiceclone

AdaSpeech - PyTorch Implementation

This is an unofficial PyTorch implementation of AdaSpeech. AdaSpeech: Adaptive text to speech for custom voice.

This project is based on ming024's implementation of FastSpeech 2.

Note:

Requirements:

pip install -r requirements.txt

Training

Preprocessing

run the preprocessing script

python preprocess.py config/pretrain/preprocess.yaml

Training

Train baseline model with

python train.py [-h] [-p PREPROCESS_CONFIG_PATH] [-m MODEL_CONFIG_PATH] [-t TRAIN_CONFIG_PATH] [--vocoder_checkpoint VOCODER_CHECKPOINT_PATH] [--vocoder_config VOCODER_CONFIG_PATH]

Finetune

Preprocessing

First, align the corpus by using MFA tool to get TextGrid (note that only finetune 1 speaker for best quality)

run the preprocessing script

python preprocess.py config/finetune/preprocess.yaml

Finetune

Finetune speaker voice with

python finetune.py [-h] [--pretrain_dir BASE_LINE_MODEL_PATH] [-p PREPROCESS_CONFIG_PATH] [-m MODEL_CONFIG_PATH] [-t TRAIN_CONFIG_PATH] [--vocoder_checkpoint VOCODER_CHECKPOINT_PATH] [--vocoder_config VOCODER_CONFIG_PATH]

TensorBoard

Use

tensorboard [--logdir LOG_PATH]

References

Citation

@misc{chen2021adaspeech,
      title={AdaSpeech: Adaptive Text to Speech for Custom Voice}, 
      author={Mingjian Chen and Xu Tan and Bohan Li and Yanqing Liu and Tao Qin and Sheng Zhao and Tie-Yan Liu},
      year={2021},
      eprint={2103.00993},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}