keonlee9420 / DiffGAN-TTS

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
MIT License
320 stars 44 forks source link

Can I ask you some questions about mel-spectrogram? #11

Open Dyongh613 opened 2 years ago

Dyongh613 commented 2 years ago

HI@keonlee9420, I have some questions to ask you about the mel-spectrogram. In the picture, image The above mel-spectrogram alignment has been generated, but the horizontal details have not been released yet. What problem do you think caused it

keonlee9420 commented 2 years ago

Hi @qw1260497397 , thanks for your attention. I need more information about your training. How many steps did you take to generate the mel-spectrogram? What dataset did you use? Did you follow the config in this repo or change something?

At first glance, it seems that more training will solve it.

Dyongh613 commented 2 years ago

Hi @keonlee9420. In my work, I use the LJSpeech, and I add the diffusion mechanism to portaspeech. The first stage is tranined by 160000 steps with 64 batch. This spectrogram was generated by training 150000 steps in the second stage. 

------------------ 原始邮件 ------------------ 发件人: "keonlee9420/DiffGAN-TTS" @.>; 发送时间: 2022年6月29日(星期三) 晚上9:35 @.>; 抄送: "Rui @.**@.>; 主题: Re: [keonlee9420/DiffGAN-TTS] Can I ask you some questions about mel-spectrogram? (Issue #11)

Hi @qw1260497397 , thanks for your attention. I need more information about your training. How many steps did you take to generate the mel-spectrogram? What dataset did you use? Did you follow the config in this repo or change something?

At first glance, it seems that more training will solve it.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

keonlee9420 commented 2 years ago

Oh, I see. Although I don't know any of details of your implementation, I can give you one tip which is to replace each module one by one with the simplest but surest architecture. For example, you may replace the encoder in PortaSpeech with FastSpeech2's text encoder to check whether the word-to-phoneme alignment was working or not.