huawei-noah / Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
557 stars 115 forks source link

Generated Samples are noisy #37

Open SandyPanda-MLDL opened 4 months ago

SandyPanda-MLDL commented 4 months ago

I have used the pretrained model as provided in the google drive of the official repo. Based on the check point of the pre-trained model when I executed the infernce.py file, the generated samples quality I observed are very noisy for different values of reverse diffusion process (10,20,30,40,50,70). If I can get any suggestion regarding the same. I have used the ckpt of Libri-TTS model (not the LJ-Speech). However, in the demo page the quality of the samples are sufficiently good.

li1jkdaw commented 1 month ago

Hi! You can check this issue. In brief - multi-speaker checkpoint trained on LibriTTS is provided only to show the possibility of making GradTTS work in multi-speaker setting, and its quality may not be good for arbitrary speaker. All the results in the paper were obtained in single-speaker setting (LJ-Speech checkpoint).