A implementation of Voice-conversion-and-morphing using RelGAN(image translation) with TensorFlow.
This enables Many to many voice conversion and voice morphing.
This is under experiment now.
Details page(japanese language)
Put the folder containing the wav files for training in named datasets.
Folders are needed 3 or more.
And Put the folder containing a few wav files for validation in datasets_val.
like this
...
│
datasets
| │
| ├── speaker_1
| │ ├── wav1_1.wav
| │ ├── wav1_2.wav
| │ ├── ...
| │ └── wav1_i.wav
| ├── speaker_2
| │ ├── wav2_1.wav
| │ ├── wav2_2.wav
| │ ├── ...
| │ └── wav2_j.wav
| ...
| └── speaker_N
| ├── wavN_1.wav
| ├── wavN_2.wav
| ├── ...
| └── wavN_k.wav
datasets_val
| │
| ├── speaker_1
| │ ├── wav1_i+1.wav
| │ ├── wav1_i+2.wav
| │ ├── ...
| │ └── wav1_i+5.wav
| ├── speaker_2
| │ ├── wav2_j+1.wav
| │ ├── wav2_j+2.wav
| │ ├── ...
| │ └── wav2_j+3.wav
| ...
| └── speaker_N
| ├── wavN_k+1.wav
| ├── wavN_k+2.wav
| ├── ...
| └── wavN_k+4.wav
...
├── preprocess1.py
├── preprocess2.py
...
python preprocess1.py
python preprocess2.py
python train_relgan_vm.py
After training, inference can be performed.
Source attribute and target attribute must be designated.
In below example, The 2nd attribute wav file, datasets_val/speaker_2, will be 60% converted to the 4th attribute (probably speaker_4).
pay attention to 0-origin index.
python eval_relgan_vm.py --source_label 1 --target_label 3 --interpolation 0.6
The examples trained using JVS (Japanese versatile speech) corpus are located in result_examples.
The following four voices were used for training.
Examples are available on youtube.
This implementation is based on njellinas's CycleGAN-VC2.
And this was created with the advice of Lgeu.