Closed ladium493 closed 4 years ago
I haven't experimented too much with unconditional generation yet. You'll have to make the code changes yourself if you want to play with it before I get around to making the changes.
Besides removing the conditioning network, you'll also need to increase the receptive field size so that it covers the entire utterance. Specifically:
All of the changes I listed can be made in params.py
and are tuned to the Speech Commands 0-9 dataset. Generating samples is pretty straightforward: in inference.py
you'll just drop the spectrogram argument and pass in None
to the model.
Let me know if you have more questions. I'd love to hear how your experiment goes!
Finally got some results instead of small noise. A smaller dataset is used for training (up down left right). T=20 result in some recognizable voices. Thanks for helping!
And what about adding padding to the short samples instead of deleting them? If all samples in a minibatch are deleted, the training process will be terminated, which is annoying. https://github.com/lmnt-com/diffwave/blob/master/src/diffwave/dataset.py#L56
Great! Glad to hear you're getting something that resembles speech.
Padding is a valid way to handle short samples, though it has the effect of being less computationally efficient. Another option - especially if you have a lot of short samples which it sounds like you do - is to reduce the number of frames to train on (also in params.py
).
Yes that works. My problems are solved. Thanks for your helping!
Yes that works. My problems are solved. Thanks for your helping!
Hi,
I'm very curious onto how to implement the unconditional generation for my experiments as well.
Would have any code modifications on how to do this?
Thanks so much
@moiseshorta, have you tried making the changes I described in https://github.com/lmnt-com/diffwave/issues/5#issuecomment-707326100? That should be a good starting point for your experiments.
For unconditional generation, is that changing
y = self.dilated_conv(y) + conditioner
in model.py toy = self.dilated_conv(y)
avaliable?And how to generate samples?