jjery2243542 / voice_conversion

246 stars 67 forks source link

Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations

This is the official implementation of the paper Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations. You can find the demo webpage here, and the pretrained model here.

Dependency

Preprocess

Our model is trained on CSTR VCTK Corpus.

Feature extraction

We use the code from Kyubyong/tacotron to extract feature. The default paprameters can be found at preprocess/tacotron/norm_utils.py.

The configuration for preprocess is at preprocess/vctk.config, where:

Once you edited the config file, you can run preprocess.sh to preprocess the dataset.

Training

You can start training by running main.py. The arguments are listed below.

Testing

You can inference by running python3 test.py. The arguments are listed below.

Reference

Please cite our paper if you find this repository useful.

@article{chou2018multi,
  title={Multi-target voice conversion without parallel data by adversarially learning disentangled audio representations},
  author={Chou, Ju-chieh and Yeh, Cheng-chieh and Lee, Hung-yi and Lee, Lin-shan},
  journal={arXiv preprint arXiv:1804.02812},
  year={2018}
}

Contact

If you have any question about the paper or the code, feel free to email me at jjery2243542@gmail.com.