keonlee9420 / Expressive-FastSpeech2

PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Other
267 stars 46 forks source link

demo #1

Closed koala7580 closed 3 years ago

koala7580 commented 3 years ago

Can you share your synthetic audio? thx

keonlee9420 commented 3 years ago

Hi @koala7580, sorry for the late reply. Unfortunately, under the same condition as pre-trained models, I did not get permission to share samples from the pre-trained models publicly.

I used the datasets to show how to handle raw data (e.g., video, noisy formatted) for (supervised) non-autoregressive TTS. However, for your reference, I would like to say that you may use RAVDESS or other clean datasets to get crystal-clear audio since the datasets in this project are not purposed on TTS. Or you may use speech enhancement model to denoise output audios (I confirmed that this method work), keep training on provided datasets.

nikich340 commented 2 years ago

Any example from anyone?

ttslr commented 9 months ago

The synthesized speech of IEMOCAP data is very noisy.

SolitaryWayfarer commented 8 months ago

IEMOCAP数据的合成语音非常嘈杂。

我也是,合成的音频嘈杂且有电音。这已经是用FullNet对原数据集进行语音增强以后,训练的结果了。请问你解决了吗