Closed lsq357 closed 5 years ago
Not currently planned. I wish I had more time..
I will leave this open to track progress on it. Not currently planned, though.
Seems like there's a folk trying to support WORLD vocoder. https://github.com/geneing/deepvoice3_pytorch
@r9y9 Thanks for the heads up!
I'm actually really interested in how this turns out. As the WORLD vocoder is used in the "UTAU" music software. If one managed to make the network be able to train successfully with this then I think we might be able to get rid of the "sound compression" artifacts that is present in most of the current deepvoice/tacotron implementations...
And example of the sound quality possible with UTAU (and therefore WORLD): https://www.youtube.com/watch?v=Es_5kvVtiNA
@geneing would you mind keeping us updated with your progess? Even if the results are not good.
Replacing Griffin-Lim with World vocoder seems to be fairly straightforward. Full transform for 22KHz signal is length 1027 vs 80 for mel output. World vocoder includes an encoder for aperiodicity and spectrogram, which reduces output to length of 131.
In my view, using WORLD vocoder, the network only changes the output shape and adds multi-output, which WORLD vocoder need at least three parameters(f0, aperiodicity, spectrogram). Moreover, it can add WORLD parameters(f0, aperiodicity spectrogram) and mel-outputs to loss function which speed convergence.(the idea is my guess!)
BTW if anyone is interested in singing neural networks. Then I just found this: http://www.dtic.upf.edu/~mblaauw/NPSS/
The spanish output sounds really awesome I think. The english and japanese sounds a little bit too stilted. But I guess that depends on what kind of dataset and music you throw at it.
Edit: forgot to mention that it seems to use the WORLD vocoder
In the view of the Tacotron 2 paper, it appears that WaveNet may be a better choice. Looking into it.
It needs much more GPUs to train Wavenet for me(in Tacotron 2 use 32 GPUs ). And WORLD vocoder can use only in cpu.
Does anybody have experience working on WaveNet? Is it impossible to train WaveNet with only 1 GPU in practice?
I experience WaveNet on two 1080Ti GPUs, it only train 3k+ steps(asyn update) each day.,batch size =32.
I try QuasiRNN + WaveNet in DeepVoice2 or DeepVoice, but my tensorflow code of QuasiRNN not speed up! I only train a week, and not sucess.
I started to implement the WaveNet vocoder. Check out https://github.com/r9y9/wavenet_vocoder/issues/1#issuecomment-354586299 if you are interested.
@geneing Have you trained your model with "world"? Could you provide some audio samples?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I made one myself. https://github.com/hash2430/dv3_world Anyone who needs it are welcome to use. I will upload sample audios soon.
Any plan for WORLD vocoder for Multi-Speaker TTS