itsuki8914 / Voice-morphing-RelGAN

A implementation voice morphing using relgan with tensorflow
MIT License
25 stars 1 forks source link

Voice-conversion-and-morphing-RelGAN

A implementation of Voice-conversion-and-morphing using RelGAN(image translation) with TensorFlow.

This enables Many to many voice conversion and voice morphing.

This is under experiment now.

Details page(japanese language)

Original papers and pages

Related papers and pages

Original implementations

Usage

  1. Put the folder containing the wav files for training in named datasets.

    Folders are needed 3 or more.

    And Put the folder containing a few wav files for validation in datasets_val.

    like this

...
│
datasets
|   │
|   ├── speaker_1
|   │     ├── wav1_1.wav
|   │     ├── wav1_2.wav
|   │     ├── ...
|   │     └── wav1_i.wav
|   ├── speaker_2
|   │     ├── wav2_1.wav
|   │     ├── wav2_2.wav
|   │     ├── ...
|   │     └── wav2_j.wav 
|   ...
|   └── speaker_N
|         ├── wavN_1.wav
|         ├── wavN_2.wav
|         ├── ...
|         └── wavN_k.wav    
datasets_val
|   │
|   ├── speaker_1
|   │     ├── wav1_i+1.wav
|   │     ├── wav1_i+2.wav
|   │     ├── ...
|   │     └── wav1_i+5.wav
|   ├── speaker_2
|   │     ├── wav2_j+1.wav
|   │     ├── wav2_j+2.wav
|   │     ├── ...
|   │     └── wav2_j+3.wav 
|   ...
|   └── speaker_N
|         ├── wavN_k+1.wav
|         ├── wavN_k+2.wav
|         ├── ...
|         └── wavN_k+4.wav 
...
├── preprocess1.py     
├── preprocess2.py
...
  1. Run preprocess1.py to remove silence and split the file.
python preprocess1.py
  1. Run preprocess2.py to extract features and output pickles.
python preprocess2.py
  1. Train RelGAN-VM.
python train_relgan_vm.py
  1. After training, inference can be performed.

    Source attribute and target attribute must be designated.

    In below example, The 2nd attribute wav file, datasets_val/speaker_2, will be 60% converted to the 4th attribute (probably speaker_4).

    pay attention to 0-origin index.

python eval_relgan_vm.py --source_label 1 --target_label 3 --interpolation 0.6

Result examples

The examples trained using JVS (Japanese versatile speech) corpus are located in result_examples.

The following four voices were used for training.

Examples are available on youtube.

Acknowledgements

This implementation is based on njellinas's CycleGAN-VC2.

And this was created with the advice of Lgeu.