lmnt-com / diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Apache License 2.0
754 stars 111 forks source link

Spectrogram Upsample #6

Closed alexdemartos closed 3 years ago

alexdemartos commented 3 years ago

Hi,

I'm having trouble with the upsampling of the mel spectrograms. How should I change the ConvTranspose2d kernel, stride and padding to match a hop size of 300? Thank you in advance.

sharvil commented 3 years ago

If you're changing the hop size, make sure you also increase n_fft to 2048.

For a hop size of 300, you'll want something like this:

self.conv1 = ConvTranspose2d(1, 1, [3, 30], stride=[1, 15], padding=[1, 8], output_padding=[0, 1])
self.conv2 = ConvTranspose2d(1, 1,  [3, 40], stride=[1, 20], padding=[1, 10])
alexdemartos commented 3 years ago

Thanks very much for your help and fast response ;)