Closed ctlaltdefeat closed 3 years ago
Why not try to mask the rest of wav of zeros The output wav length should be hop_size * mel_length.
You can also pad the spectrogram with -11.52 which should make the padded area equivalent to silence.
Why not try to mask the rest of wav of zeros The output wav length should be hop_size * mel_length.
Thanks, you're right about that and it's the obvious solution, and I do know the hop_size
so that's easy to implement.
You can also pad the spectrogram with -11.52 which should make the padded area equivalent to silence.
That would be better than zero-padding but I haven't tested whether a trained hifigan model would accurately convert those frames to total silence.
When doing batch synthesis (inference), I zero-pad the mel inputs so that they are the same length, which causes a harsh, buzzing sound to be generated by HiFi-GAN.
Assuming that batching is required for my application's performance purposes, what is the advised approach to dealing with this issue? I don't see support for passing in any sort of mask argument. Should I just try to heuristically cut the resulting wav audio so as to eliminate the noise at the end?