thorstenMueller / Thorsten-Voice

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
http://www.thorsten-voice.de
Creative Commons Zero v1.0 Universal
545 stars 51 forks source link

Recording free emotional german dataset #13

Closed thorstenMueller closed 3 years ago

thorstenMueller commented 3 years ago

Because there exist some interesting papers based on emotional speech datasets i've decided to go back to the microphone and record a free to use emotional german dataset.

I've prepared a german corpus:

I'll record every phrase in following emotions:

I'm no professional voice actor so quality might be not as good as some might expect.

Will keep you updated on recording process.

thorstenMueller commented 3 years ago

Recording of "angry" phrases for emotional dataset is in progress. Always keep in mind that i'm no professional voice artist. https://soundcloud.com/thorsten-mueller-395984278/sets/thorsten-emotional-dataset

@monatis what do you think?

monatis commented 3 years ago

@thorstenMueller Wow you're doing an amazing job --I can easily get the emotion in the sample. I strongly believe that it will have a substantial impact in the literature.

thorstenMueller commented 3 years ago

Thanks @monatis i highly appreciate your nice words :+1: . If my dataset would be mentioned in literature this would be a honor for me, but i just hope my voice contribution is useful for someone.

thorstenMueller commented 3 years ago

Short update:

I've published some clips on soundcloud:

A comparison of two sentences in all current recorded emotions can be heard here:

thorstenMueller commented 3 years ago

Recording of emotional dataset is finished :speech_balloon: :partying_face: It's been harder and took longer than i thought pronouncing emotions on phrases which do not match these emotion, but i tried my best.

Please keep in mind that i'm just a guy sharing his voice with you folks, i'm no professional voice artist.

@domcross is now doing his audio optimization magic on recordings and once he's done with that i'll publish them.

Until then - here are two samples on how the results sound:

Mist, wieder nichts geschafft. Es kann doch nicht so schwer sein, einen Ring ins Feuer zu werfen.

Samples are spoken in following emotion order:

https://soundcloud.com/thorsten-mueller-395984278/sets/thorsten-de-emotional-dataset

monatis commented 3 years ago

@thorstenMueller This is awesome! Congrats on the hard work and the great job you're doing here 👏 Looking forward to the release of processed full version.

I'm truely happy to witness such great dataset contributions to the community. Can't wait to work with this one 🚀

thorstenMueller commented 3 years ago

Audio optimisation for my emotional dataset contribution is finished (thanks @domcross). I've compressed the audio files + csv metadata and upload is ready to go. Release is planned as a gift for easter holidays :rabbit:. See my twitter account for updates (https://twitter.com/ThorstenVoice) and of course, this topic here.

monatis commented 3 years ago

Great news! Thank you @thorstenMueller and @domcross. By the way, Google published a parallel version of Tacotron 2, which is non-autoregressive thus faster, more stable and more natural according to subjective listening tests. I started to work on this version.

snakers4 commented 3 years ago

I wonder how you have chosen these 304 phrases? It is quite difficult to find phrases that match the emotion

thorstenMueller commented 3 years ago

@snakers4 Phrases are not emotion specific. All sentences are generic and i just pronounced these in different emotions. As good as i could since i am no professional voice talent.

snakers4 commented 3 years ago

Well, you did a great job, since when I was listening to the "Wütend" phrases, I instantly remembered that scene from "Der Untergang".

On a more serious note, looks like it was very difficult to record ~300 phrases per emotion, since it took ~2 months? Were there any non-obvious issues? I am asking because we are planning to record something similar as well.

snakers4 commented 3 years ago

Also check this out - https://silyfox.github.io/iscslp-98-demo/ - sounds like anime Similar work to yours in essence

thorstenMueller commented 3 years ago

Well, you did a great job, since when I was listening to the "Wütend" phrases, I instantly remembered that scene from "Der Untergang".

On a more serious note, looks like it was very difficult to record ~300 phrases per emotion, since it took ~2 months? Were there any non-obvious issues? I am asking because we are planning to record something similar as well.

Thanks :-) It's not easy to get into the right emotion on sentences which do not match these emotion.

300 phrases in 5 emotions = 1.500 recordings. I didn't record every day so it took some time until it was finished.

snakers4 commented 3 years ago

300 phrases in 5 emotions = 1.500 recordings.

When we recorded speakers for "normal", they could do about 15k phrases in 3-4 weeks, also not every day. Looks like recording with emotion is more difficult

snakers4 commented 3 years ago

Also I have found @ThorstenMueller on telegram, is it you?

thorstenMueller commented 3 years ago

Also I have found @thorstenMueller on telegram, is it you?

No, i'm not using telegram. You can pm me in @coqui-ai Gitter chat (https://gitter.im/coqui-ai/TTS).

thorstenMueller commented 3 years ago

Hello. I hope you've some relaxing easter holidays :rabbit: :slightly_smiling_face:.

As promised i've released my emotional "Thorsten" dataset :partying_face:. I hope it's useful for someone and (as always) please keep in mind that i'm no professional voice talent, just a guy sharing his voice.

Infos and download links can be found here: https://github.com/thorstenMueller/deep-learning-german-tts/#dataset-Thorsten-emotional

thorstenMueller commented 3 years ago

I'll close this issue due my emotional dataset has been released. But feel free to post here for feedback on this.

thorstenMueller commented 3 years ago

After release means before new ideas are coming to mind. See here for "drunk" samples (just pronounced this way, i'm not drunk while recording ;-) ) https://twitter.com/ThorstenVoice/status/1386060775488430085

thorstenMueller commented 3 years ago

@monatis I'm just being curious, did you have time time listen to my emotional dataset and/or found an intended usage?

monatis commented 3 years ago

Hi @thorstenMueller, I listened to some samples and it sounded quite promising. Unfortunately I was dealing with health problems of my parents at that time so I just missed a feedback. Sorry for that. Currently working on a NLP project in Turkish, then I want to return to working with Emotional Thorsten dataset to make some experiments with the following: (1) Introducing an emotion vector to the architecture of Tacotron2 and finetuning it for Emotional TTS. (2) Training few-shot adaptation methods like AdaDurIAN. Let's see what we can build.

thorstenMueller commented 3 years ago

Hi @monatis . Thanks for your fast response and sorry to hear your parents had health problems. Hopefully things are doing well now. I justed wanted to be sure, that you know my emotional dataset is ready for "whatever". Your ideas sounds promising.

monatis commented 3 years ago

@thorstenMueller Yes they are up and running now 😃 Thanks for the reminder and of course for the great dataset you're giving away to the community 😊 By the way, I'm leading the TensorFlow Turkey community and we're holding regular online events to talk about TensorFlow things and broader machine learning. If you would like to, I want to have you as a guest in one of such events to talk about your motivation and experience in building and sharing this dataset. I'm pretty sure that it will be a great inspiration for a lot of people. If you agree, you can simply say hello at the email address on my profile to talk about the details.

thorstenMueller commented 3 years ago

After release means before new ideas are coming to mind. See here for "drunk" samples (just pronounced this way, i'm not drunk while recording ;-) ) https://twitter.com/ThorstenVoice/status/1386060775488430085

Today i've released version 2 of my emotional dataset :tada:. In addition to emotions from version 1:

It now includes:

Check details and download on my Github page https://github.com/thorstenMueller/deep-learning-german-tts/#dataset-Thorsten-emotional

@monatis: Just in case it's interesting for you.