patrickltobing / cyclevae-vc-neuralvoco

Apache License 2.0
90 stars 19 forks source link

Low-latency real-time multispeaker voice conversion (VC) with cyclic variational autoencoder (CycleVAE) and multiband WaveRNN using data-driven linear prediction (MWDLP)

Requirements:

Installation

$ cd tools
$ make
$ cd ..

Latest version

Compilable demo

Samples from compilable demo

Steps to build the models:

  1. Data preparation and preprocessing
  2. VC and neural vocoder models training [~ 2.5 and 4 days each, respectively]
  3. VC fine-tuning with fixed neural vocoder [~ 2.5 days]
  4. VC decoder fine-tuning with fixed encoder and neural vocoder [~ 2.5 days]

Steps for real-time low-latency decoding with CPU:

  1. Dump and compile models
  2. Decode

Real-time implementation is based on LPCNet.

Details

Please see egs/cycvae_mwdlp_vcc20/README.md for more details on VC + neural vocoder

or

egs/mwdlp_vcc20/README.md for more details on neural vocoder only.

References

[1] High-Fidelity and Low-Latency Universal Neural Vocoder based on Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform Modeling

[2] Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction

Contact

Patrick Lumban Tobing

patrickltobing@gmail.com

patrick.lumbantobing@g.sp.m.is.nagoya-u.ac.jp