discrete token for audio resynthesis

kan-bayashi / ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

https://kan-bayashi.github.io/ParallelWaveGAN/

MIT License

1.54k stars 339 forks source link

discrete token for audio resynthesis #423

Open South-Twilight opened 7 months ago

South-Twilight commented 7 months ago

Here is the PR for audio resynthesis in discrete token: 1) We extend hubert_voc1 to token_voc1 and it can handle more models token; 2) We add f0 for training and inference when finding poor prounciation in singing; 2) We add multi-stream method including residual cluster and weight sum; 3) Using embedding feature of models is also allowed.

The following models have been validated in opencpop recipe: HuBERT, XLS-R, WavLM, MERT, Encodec.