lmnt-com / diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Apache License 2.0
754 stars 111 forks source link

hard coded assumption of 256 for hop_samples #24

Closed michael-conrad closed 2 years ago

michael-conrad commented 2 years ago

I don't where at in the code, but there is effectively a hard coded requirement of 256 for hop_samples length:

Errors for different hop_samples != 256:

hop_samples==254 → tensor a (12700) must match the size of tensor b (12800)
hop_samples==255 → tensor a (12750) must match the size of tensor b (12800)
hop_samples==257 → tensor a (12850) must match the size of tensor b (12800)
hop_samples==275 → tensor a (13750) must match the size of tensor b (12800)

where tensor_a size = 50*hop_samples and tensor_b size = 50*256

michael-conrad commented 2 years ago

Shouldn't a smaller hop_sample length result in more samples and not less when dividing a sample up by hop_sample length?

jthickstun commented 2 years ago

The hop size is implicitly hard coded in the SpectrogramUpsampler (in model.py): this logic is responsible for upsampling the input spectrograms to the same rate as the output audio. The two ConvTranspose2d operations upsample the rate by a factor of 256 (16 x 16); you can play around with the stride and padding of these transposed convolutions to target different hop sizes.

You'll also want to re-run preprocess.py if you change the hop size.