ZeroSpeech 2020

Implementation of our submission to ZeroSpeech 2020 (Hou et al.)

Results: https://zerospeech.com/2020/results.html

The system is composed of a hierarchical VQ-VAE encoder to discover discrete spoken word units from speech and a MelGAN vocoder to directly generate speech. They are trained separately. During VQ-VAE training, we add the speaker id before the VQ-VAE decoder to help reduce the speaker information encoded in the word units.

Usage:

Run scripts/data_manifest.sh (modify datadir to the root dir of raw dataset)
python trainer.py --mode vqvae --language [english/surprise] --ckpt_path [path-to-save-model] --datadir [path-to-the-root-dir-of-dataset]
python trainer.py --mode melgan --load_vqvae True --pretrained_vqvae [path-to-vqvae-ckpts] --language [english/surprise] --ckpt_path [path-to-save-model] --datadir [path-to-the-root-dir-of-dataset]
python evaluator.py --language [english/surprise] --datadir [path-to-the-root-dir-of-dataset] --vqvae_model [path-to-vqvae-ckpts] --melgan_model [path-to-melgan-ckpts] --save_path [path-to-save-generated-results]

Referred Repositories:

vq-vae-2-pytorch: https://github.com/rosinality/vq-vae-2-pytorch
melgan-neurips: https://github.com/descriptinc/melgan-neurips
ZeroSpeech-TTS-without-T: https://github.com/andi611/ZeroSpeech-TTS-without-T
VQ-VAE-Speech: https://github.com/swasun/VQ-VAE-Speech
pytorch-vqvae: https://github.com/houwenxin/pytorch-vqvae

houwenxin / zerospeech2020

readme

ZeroSpeech 2020

Usage:

Referred Repositories: