ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
MIT License
1.79k stars 529 forks source link

My generated outputs all have a beeping sound, althought the alignment is correct. #151

Open wolfassi123 opened 2 years ago

wolfassi123 commented 2 years ago

I have been training on my own custom data for a while now. I used an aligner and the alignment seems to be working fine. I added the TextGrid to the model and trained for around 2 hours using GPU (I have around 40 minutes of Augmented Data), but all of my synthesized outputs come out as beeps. Any idea what to do to solve the issue. Should I be using more data? More training time? Is my data bad?

samin9796 commented 2 years ago

@wolfassi123 Did you able to fix it? I am also getting beeping sound with training a model on LJSpeech dataset.

zaynabmu commented 2 years ago

Did you able to fix it? I faced the same problem @wolfassi123 @samin9796

hhm853610070 commented 1 year ago

@zaynabmu @samin9796 Did you fix it?I face this problem when i use the frame level features of pitch and energy.The quality of synthesized audio (including train and val data) is good in the trainning phase,but the quality of audio synthesized in inferencing phase is bad.