Starting an unconditional generation experiment

lmnt-com / diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Apache License 2.0

778 stars 113 forks source link

Starting an unconditional generation experiment #5

Closed ladium493 closed 4 years ago

ladium493 commented 4 years ago

For unconditional generation, is that changing y = self.dilated_conv(y) + conditioner in model.py to y = self.dilated_conv(y) avaliable?

And how to generate samples?

sharvil commented 4 years ago

I haven't experimented too much with unconditional generation yet. You'll have to make the code changes yourself if you want to play with it before I get around to making the changes.

Besides removing the conditioning network, you'll also need to increase the receptive field size so that it covers the entire utterance. Specifically:

increase number of layers to 36
increase dilation cycle length to 12
increase diffusion steps to 200
increase residual channels to 256
use a linear spaced noise schedule covering [1e-4, 2e-2]

All of the changes I listed can be made in params.py and are tuned to the Speech Commands 0-9 dataset. Generating samples is pretty straightforward: in inference.py you'll just drop the spectrogram argument and pass in None to the model.

Let me know if you have more questions. I'd love to hear how your experiment goes!

ladium493 commented 4 years ago

Finally got some results instead of small noise. A smaller dataset is used for training (up down left right). T=20 result in some recognizable voices. Thanks for helping!

And what about adding padding to the short samples instead of deleting them? If all samples in a minibatch are deleted, the training process will be terminated, which is annoying. https://github.com/lmnt-com/diffwave/blob/master/src/diffwave/dataset.py#L56

sharvil commented 4 years ago

Great! Glad to hear you're getting something that resembles speech.

Padding is a valid way to handle short samples, though it has the effect of being less computationally efficient. Another option - especially if you have a lot of short samples which it sounds like you do - is to reduce the number of frames to train on (also in params.py).

ladium493 commented 4 years ago

Yes that works. My problems are solved. Thanks for your helping!

ladium493 commented 4 years ago

Yes that works. My problems are solved. Thanks for your helping!

moiseshorta commented 4 years ago

Hi,

I'm very curious onto how to implement the unconditional generation for my experiments as well.

Would have any code modifications on how to do this?

Thanks so much

sharvil commented 4 years ago

@moiseshorta, have you tried making the changes I described in https://github.com/lmnt-com/diffwave/issues/5#issuecomment-707326100? That should be a good starting point for your experiments.