ZeroSpeech 2020
Implementation of our submission to ZeroSpeech 2020 (Hou et al.)
Results: https://zerospeech.com/2020/results.html
The system is composed of a hierarchical VQ-VAE encoder to discover discrete spoken word units from speech and a MelGAN vocoder to directly generate speech. They are trained separately. During VQ-VAE training, we add the speaker id before the VQ-VAE decoder to help reduce the speaker information encoded in the word units.
Usage:
- Run scripts/data_manifest.sh (modify datadir to the root dir of raw dataset)
- python trainer.py --mode vqvae --language [english/surprise] --ckpt_path [path-to-save-model] --datadir [path-to-the-root-dir-of-dataset]
- python trainer.py --mode melgan --load_vqvae True --pretrained_vqvae [path-to-vqvae-ckpts] --language [english/surprise] --ckpt_path [path-to-save-model] --datadir [path-to-the-root-dir-of-dataset]
- python evaluator.py --language [english/surprise] --datadir [path-to-the-root-dir-of-dataset] --vqvae_model [path-to-vqvae-ckpts] --melgan_model [path-to-melgan-ckpts] --save_path [path-to-save-generated-results]
Referred Repositories:
- vq-vae-2-pytorch: https://github.com/rosinality/vq-vae-2-pytorch
- melgan-neurips: https://github.com/descriptinc/melgan-neurips
- ZeroSpeech-TTS-without-T: https://github.com/andi611/ZeroSpeech-TTS-without-T
- VQ-VAE-Speech: https://github.com/swasun/VQ-VAE-Speech
- pytorch-vqvae: https://github.com/houwenxin/pytorch-vqvae