mush42 / optispeech

A lightweight end-to-end text-to-speech model
MIT License
91 stars 12 forks source link

hoarse pronunciation #14

Open Shenkailai opened 5 days ago

Shenkailai commented 5 days ago

First of all, I would like to express my sincere gratitude to the authors. This is an excellent piece of work! I have used ConvNext_TTS, and its synthesis quality is impressive, with very fast inference speed.

I trained the model on a roughly 300-hour dataset of both Chinese and English. However, the synthesized speech occasionally has a sudden hoarseness on individual words, and increasing the number of training epochs does not seem to resolve the issue. I've trained for approximately 4M steps, but the problem persists.

image For example, the last word of this speech segment seems to lack properly generated harmonics. baker_004.zip

RZJM commented 5 days ago

您好,我用了你pr的中文前端,训练的时候出现了下面的错误,您遇到过吗? 2024-11-08 14-31-00 的屏幕截图

Shenkailai commented 5 days ago

这个pr有部分代码存在问题,修改后的代码还未提交。因为目前训练出来的中文存在上述问题,考虑到也有可能是这部分中文前端的问题,所以暂时先不提交新pr了。

RZJM commented 5 days ago

好的,明白了