Closed prattcmp closed 3 years ago
It's the input shape in inference. If the shape doesn't match, you can simply add transpose code by referring to #27. No normalization is needed after generating the mel-spectrogram. It would be helpful to find a solution if you post details you have tried.
@jik876 I’ve tried transposing. I’ve tried (1, n_mels, frames) and (1, frames, n_mels). I think shape the shape I’m using is correct, but all I get out is static. Is it a preprocessing problem? I use librosa to generate ground truth mel spectrograms for my Tacotron model.
I’ve read through that issue and it did not help.
@jik876 I’ve tried transposing. I’ve tried (1, n_mels, frames) and (1, frames, n_mels). I think shape the shape I’m using is correct, but all I get out is static. Is it a preprocessing problem? I use librosa to generate ground truth mel spectrograms for my Tacotron model.
I’ve read through that issue and it did not help.
If you use librosa load wav, it is float32 type, so you do not need divide the wav of librosa loading by MAX_SIZE(32768). It wiil make input become almost zero.
I close this as there are no recent updates. Please reopen if you need additional comments.
I've seen a lot of general discussion about inputting generated mels into Hifi-GAN and of course we can see the hparams for mel spectrogram in each config file, but nothing that actually says what the format is for input x to
Generator(x)
. Is it(1, n_mels, frames)
? Is normalization expected? Nothing I've tried works.