Neural vocoder support - Githubissues

k2kobayashi / crank

A toolkit for non-parallel voice conversion based on vector-quantized variational autoencoder

MIT License

169 stars 31 forks source link

Neural vocoder support #2

Closed unilight closed 4 years ago

unilight commented 4 years ago

Let's discuss neural vocoder support.

Types

Parallel WaveGAN (PWG), MelGAN I can easily implement these two since I am familiar with it. Kan-bayashi has packed his repo so we can simply pip install -U parallel_wavegan.
WaveNet vocoder I don't really prefer this one due to its slow inference speed.
WaveGlow I don't really prefer this one due to its slow training speed.

Usage & Structure We can add a synthesis stage to the recipe. We can provide pretrained models for users to download, and use an argument like voc_expdir to load the pretrained model. In addition, with PWG, kan-bayashi has also packed training code in the package, so we can provide recipes for users to train their own vocoders if they want. One example design can be like egs/pwg/vcc2018.

k2kobayashi commented 4 years ago

It's nice to integrate PWG/MelGAN. Actually, I have already discuss it with @kan-bayashi. He said he can create recipe and pre-trained model after releasing vcc2020 dataset.

For the structure, I think following is nice.

add stage 6 in egs/vaevc/template/run.sh
implement egs/vaevc/<recipe>/local/download_pretrained_neuralvocoder.sh to download pre-trained models.
implement crank/bin/generate_wav_{pwg,melgan}.py to generate wav file w/ pre-trained model and generated h5 in stage 5.

unilight commented 4 years ago

I see. The structure looks okay with me. How about a recipe for training the neural vocoders? Also, if okay, I can implement the recipes first and have you and kan-bayashi revise it.

k2kobayashi commented 4 years ago

Training neural vocoder is out of scope of this repository. You can do either contribute to kan-bayashi/ParallelWaveGAN or train and upload pre-trained model anywhere. I think he will help you to contribute PWG recipe. Of course I can revise source codes for crank repo.

unilight commented 4 years ago

I see. I will train vocoders in kan-bayashi/ParallelWaveGAN and just provide pretrained model links in this repo. I will work on it next.

k2kobayashi commented 4 years ago

Let me know when you trained neural vocoder.

unilight commented 4 years ago

I have trained a PWG for VCC2018. I will send a PR later.