Open SandyPanda-MLDL opened 6 months ago
Hi! You can check this issue. In brief - multi-speaker checkpoint trained on LibriTTS is provided only to show the possibility of making GradTTS work in multi-speaker setting, and its quality may not be good for arbitrary speaker. All the results in the paper were obtained in single-speaker setting (LJ-Speech checkpoint).
I have used the pretrained model as provided in the google drive of the official repo. Based on the check point of the pre-trained model when I executed the infernce.py file, the generated samples quality I observed are very noisy for different values of reverse diffusion process (10,20,30,40,50,70). If I can get any suggestion regarding the same. I have used the ckpt of Libri-TTS model (not the LJ-Speech). However, in the demo page the quality of the samples are sufficiently good.