sungjae-cho / ICASSP2020_STDemo

Show and Tell demonstration homepage
https://sungjae-cho.github.io/ICASSP2020_STDemo/
4 stars 0 forks source link
deep-learning emotion icassp icassp-2020 show-and-tell text-to-speech transfer-learning tts

ICASSP Show and Tell Demonstration

Demo hompage link: https://sungjae-cho.github.io/ICASSP2020_STDemo/

Learning to Transfer Multi-speaker Emotional Prosody to a Neutral Speaker

In this demo, we are unable to provide an interactive environment that incorporates SSML to easily control prosody of input text. We aplogize for no provision of an interactive environment.

All of the below phrases are unseen by our TTS model during training.

Demo 1: Multi-speaker emotional TTS

In the first demo, we are demonstrating multi-speaker emotional TTS. Through our system, you can synthesize 5 emotional voices across two female speakers: B and J. In the training data, we had neutral and emotional speech audios of the two speakers. This is a major difference with the second demo. I'm going to play 3 examples for each emotion-speaker pair. What we want you to pay attention to is how different given audios are across emotions and speakers. First, you will listen to a neural voice, and then an emotional voice will be presented.

To listen to audios, go to https://sungjae-cho.github.io/ICASSP2020_STDemo/.

Demo 2: Emotional TTS spoken by a neutral speaker

In the second demo, we are demonstrating emotional TTS spoken by a neutral speaker. Through our system, you can synthesize 5 emotional voices of a female speaker, L. In the training data, there were only neutral speech audios of the L speaker. Our system can generate emotional speech of the L speaker because emotional prosody has been transferred from the other two speakers' emotional prosody by jointly learning their emotional speech. Let’s listen to emotional voices of the L speaker.

To listen to audios, go to https://sungjae-cho.github.io/ICASSP2020_STDemo/.

Acknowledgement

This work was supported by Institute of Information & Communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) [2016-0-00562(R0124-16-0002), Emotional Intelligence Technology to Infer Human Emotion and Carry on Dialogue Accordingly], and Ministry of Culture, Sports and Tourism(MCST) and Korea Creative Content Agency(KOCCA) in the Culture Technology(CT) Research & Development Program 2019.