rishikksh20 / FastSpeech2

PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech
Apache License 2.0
225 stars 51 forks source link

Noisy output when using the provided checkpoints #4

Closed marzieh-razavi closed 4 years ago

marzieh-razavi commented 4 years ago

Hello, Thanks for providing this repo. I get a very noisy output when using the checkpoints you provided (https://drive.google.com/drive/folders/1Fh7zr8zoTydNpD6hTNBPKUGN_s93Bqrs) to do the synthesis (using the synthesis.py code). I have also trained a fastspeech2 model using your code and I am getting a noisy output using my checkpoints as well. I have attached the output when using the checkpoints you've provided (checkpoint_model_150k_steps.pyt). output.zip. I would be grateful if I could know what causes the difference between the attached generated output and the outputs provided in the sample folder.

rishikksh20 commented 4 years ago

@marzieh-razavi yes that actual pitch/f0 and energy are overfitted that's why you get high amplitude noisy output use 58k checkpoint with waveglow vocoder you feel less noise. I am currently working on that part. do share your tensorboard, as your test_tts.wav is too noisy due to very high amplitude..

ming024 commented 4 years ago

I find similar problem as @marzieh-razavi. This is my tensorboard screenshot for training/validation. screenshot_1 screenshot

I think the model starts to overfit over the training set at about 12k steps, but I get noisy output when I use my reproduced checkpoint at 10k-th step to generate audio as below. I am not sure whether the noise is due to overfitting or any other reason. test_tts.zip

rishikksh20 commented 4 years ago

@ming024 are you using synthesis.py to generate output? If yes then please use colab notebook and check your checkpoint there because their are some issues with MelGAN when I am saving the audio in synthesis.py or to get rid of that use waveglow. Actually, Whenever I play generated audio in IPython.display.Audio it's playing well but when I save the same audio (wav) using librosa.output.write_wav or scipy it gets noisy I don't know why? especially when audio generated from MelGAN and this problem is not with Waveglow.

ming024 commented 4 years ago

@rishikksh20 Save the wav signal as 16-bit integers with scipy.io.wavfile.write fix the problem on my machine.

from scipy.io.wavfile import write

write(path, hp.sample_rate, x.astype('int16'))

scipy_int16.zip

rishikksh20 commented 4 years ago

@ming024 thanks

rishikksh20 commented 4 years ago

@marzieh-razavi take pull of latest commit and then use your checkpoint and run synthesis.py . Please report here if issue still exist.

marzieh-razavi commented 4 years ago

@rishikksh20 Thanks a lot for providing the update. It works now.