Open Dyongh613 opened 2 years ago
Hi @qw1260497397 , thanks for your attention. I need more information about your training. How many steps did you take to generate the mel-spectrogram? What dataset did you use? Did you follow the config in this repo or change something?
At first glance, it seems that more training will solve it.
Hi @keonlee9420. In my work, I use the LJSpeech, and I add the diffusion mechanism to portaspeech. The first stage is tranined by 160000 steps with 64 batch. This spectrogram was generated by training 150000 steps in the second stage.
------------------ 原始邮件 ------------------ 发件人: "keonlee9420/DiffGAN-TTS" @.>; 发送时间: 2022年6月29日(星期三) 晚上9:35 @.>; 抄送: "Rui @.**@.>; 主题: Re: [keonlee9420/DiffGAN-TTS] Can I ask you some questions about mel-spectrogram? (Issue #11)
Hi @qw1260497397 , thanks for your attention. I need more information about your training. How many steps did you take to generate the mel-spectrogram? What dataset did you use? Did you follow the config in this repo or change something?
At first glance, it seems that more training will solve it.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Oh, I see. Although I don't know any of details of your implementation, I can give you one tip which is to replace each module one by one with the simplest but surest architecture. For example, you may replace the encoder in PortaSpeech with FastSpeech2's text encoder to check whether the word-to-phoneme alignment was working or not.
HI@keonlee9420, I have some questions to ask you about the mel-spectrogram. In the picture, The above mel-spectrogram alignment has been generated, but the horizontal details have not been released yet. What problem do you think caused it