keonlee9420 / DiffGAN-TTS

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
MIT License
311 stars 44 forks source link

stft #2

Closed KMzuka closed 2 years ago

KMzuka commented 2 years ago

Hello, thank you very much for the open source project. I ran into a problem: the model successfully converged during training, but after generating the mel spectrum (which looked very good), I put the mel spectrum into my own hifigan vocoder, and the resulting wav was murmur, I could be sure that the parameters of the hifigan's sample radio, hoplength and winlength were consistent with the diffgan model, and I guessed that the problem was in the process of processing the audio of the data into a mel spectrum. I noticed that you used pytorch-stft to implement it, which is very different from the processing result of librosa.stft?

keonlee9420 commented 2 years ago

Hi @KMzuka , thanks for your attention. Could you share the mel-spectrogram in .png and .npy as well? I can say that the configuration should be matched each other in this project. Are you using your own version of hifigan?

KMzuka commented 2 years ago

I think I found the reason, the mel spectrum input to Vocoder has undergone some special processing.