Open cschaefer26 opened 3 years ago
Hi, great repo!
I found that the audio quality improves considerably with a slightly increased ResNet as suggested in https://arxiv.org/pdf/2005.05106.pdf. The shaky and metallic artefacts are reduced a lot.
Here is a comparison of your pretrained LJSpeech with a current model I am still training (for TTS I used https://github.com/as-ideas/ForwardTacotron)
Original (6400 epochs): https://drive.google.com/file/d/1LOIB9B7LDX9g-kVu_p1anGJgJ5vjE27s/view?usp=sharing
Larger ResNet (2000 epochs): https://drive.google.com/file/d/19_d2SQU1xZi-o90MJ8NcKhIS6AFwliH-/view?usp=sharing
If you are interested I could open a PR making the layers more flexible.
Hi, great repo!
I found that the audio quality improves considerably with a slightly increased ResNet as suggested in https://arxiv.org/pdf/2005.05106.pdf. The shaky and metallic artefacts are reduced a lot.
Here is a comparison of your pretrained LJSpeech with a current model I am still training (for TTS I used https://github.com/as-ideas/ForwardTacotron)
Original (6400 epochs): https://drive.google.com/file/d/1LOIB9B7LDX9g-kVu_p1anGJgJ5vjE27s/view?usp=sharing
Larger ResNet (2000 epochs): https://drive.google.com/file/d/19_d2SQU1xZi-o90MJ8NcKhIS6AFwliH-/view?usp=sharing
If you are interested I could open a PR making the layers more flexible.