shaojinding / Adversarial-Many-to-Many-VC

[InterSpeech 2020] "Improving the Speaker Identity of Non-Parallel Many-to-Many VoiceConversion with Adversarial Speaker Recognition" by Shaojin Ding, Guanlong Zhao, Ricardo Gutierrez-Osuna
Other
39 stars 12 forks source link
adversarial-speaker-recognition speaker-encoder speaker-identity vctk voice-conversion

Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition

Code for this paper Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition

Shaojin Ding, Guanlong Zhao, Ricardo Gutierrez-Osuna

Accepted by INTERSPEECH 2020

This is a TensorFlow + Pytorch implementation. This implementation is adapted from the Real Time Voice Clone implementation at https://github.com/CorentinJ/Real-Time-Voice-Cloning.

Dataset:

Requirements

Data preprocessing

We use the speaker encoder model and vocoder model from here. We only train the voice conversion model (i.e., synthesizer).

Before running, put the speaker encoder and vocoder at encoder/saved_models/pretrained.pt and vocoder/saved_models/pretrained/pretrained.pt

  1. Download and uncompress the VCTK dataset.
  2. Manually split the train and test set (there is no official data split). Put them as <dataset_root>/VCTK/train/p227 and <dataset_root>/VCTK/test/p228
  3. Run python synthesizer_preprocess_audio.py <datasets_root>
  4. Run python synthesizer_preprocess_embeds.py <datasets_root>/SV2TTS/synthesizer_train
  5. Run python synthesizer_preprocess_embeds.py <datasets_root>/SV2TTS/synthesizer_test

Training and inference

To launch training:

$ python synthesizer_train.py vc_adversarial <datasets_root>/SV2TTS/synthesizer_train

To run inference, use synthesis_ppg_script.py. Change the syn_dir to the path of the trained model, e.g., synthesizer/saved_models/logs-train_adversarial_vctk/taco_pretrained

Acknowledgement

The code is adapted from CorentinJ / Real-Time-Voice-Cloning.

Cite the work

@article{dingimproving,
  title={Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition},
  author={Ding, Shaojin and Zhao, Guanlong and Gutierrez-Osuna, Ricardo}
}