Closed thorstenMueller closed 3 years ago
Recording of "angry" phrases for emotional dataset is in progress. Always keep in mind that i'm no professional voice artist. https://soundcloud.com/thorsten-mueller-395984278/sets/thorsten-emotional-dataset
@monatis what do you think?
@thorstenMueller Wow you're doing an amazing job --I can easily get the emotion in the sample. I strongly believe that it will have a substantial impact in the literature.
Thanks @monatis i highly appreciate your nice words :+1: . If my dataset would be mentioned in literature this would be a honor for me, but i just hope my voice contribution is useful for someone.
Short update:
I've published some clips on soundcloud:
A comparison of two sentences in all current recorded emotions can be heard here:
Recording of emotional dataset is finished :speech_balloon: :partying_face: It's been harder and took longer than i thought pronouncing emotions on phrases which do not match these emotion, but i tried my best.
Please keep in mind that i'm just a guy sharing his voice with you folks, i'm no professional voice artist.
@domcross is now doing his audio optimization magic on recordings and once he's done with that i'll publish them.
Until then - here are two samples on how the results sound:
Mist, wieder nichts geschafft. Es kann doch nicht so schwer sein, einen Ring ins Feuer zu werfen.
Samples are spoken in following emotion order:
https://soundcloud.com/thorsten-mueller-395984278/sets/thorsten-de-emotional-dataset
@thorstenMueller This is awesome! Congrats on the hard work and the great job you're doing here 👏 Looking forward to the release of processed full version.
I'm truely happy to witness such great dataset contributions to the community. Can't wait to work with this one 🚀
Audio optimisation for my emotional dataset contribution is finished (thanks @domcross). I've compressed the audio files + csv metadata and upload is ready to go. Release is planned as a gift for easter holidays :rabbit:. See my twitter account for updates (https://twitter.com/ThorstenVoice) and of course, this topic here.
Great news! Thank you @thorstenMueller and @domcross. By the way, Google published a parallel version of Tacotron 2, which is non-autoregressive thus faster, more stable and more natural according to subjective listening tests. I started to work on this version.
I wonder how you have chosen these 304 phrases? It is quite difficult to find phrases that match the emotion
@snakers4 Phrases are not emotion specific. All sentences are generic and i just pronounced these in different emotions. As good as i could since i am no professional voice talent.
Well, you did a great job, since when I was listening to the "Wütend" phrases, I instantly remembered that scene from "Der Untergang".
On a more serious note, looks like it was very difficult to record ~300 phrases per emotion, since it took ~2 months? Were there any non-obvious issues? I am asking because we are planning to record something similar as well.
Also check this out - https://silyfox.github.io/iscslp-98-demo/ - sounds like anime Similar work to yours in essence
Well, you did a great job, since when I was listening to the "Wütend" phrases, I instantly remembered that scene from "Der Untergang".
On a more serious note, looks like it was very difficult to record ~300 phrases per emotion, since it took ~2 months? Were there any non-obvious issues? I am asking because we are planning to record something similar as well.
Thanks :-) It's not easy to get into the right emotion on sentences which do not match these emotion.
300 phrases in 5 emotions = 1.500 recordings. I didn't record every day so it took some time until it was finished.
300 phrases in 5 emotions = 1.500 recordings.
When we recorded speakers for "normal", they could do about 15k phrases in 3-4 weeks, also not every day. Looks like recording with emotion is more difficult
Also I have found @ThorstenMueller on telegram, is it you?
Also I have found @thorstenMueller on telegram, is it you?
No, i'm not using telegram. You can pm me in @coqui-ai Gitter chat (https://gitter.im/coqui-ai/TTS).
Hello. I hope you've some relaxing easter holidays :rabbit: :slightly_smiling_face:.
As promised i've released my emotional "Thorsten" dataset :partying_face:. I hope it's useful for someone and (as always) please keep in mind that i'm no professional voice talent, just a guy sharing his voice.
Infos and download links can be found here: https://github.com/thorstenMueller/deep-learning-german-tts/#dataset-Thorsten-emotional
I'll close this issue due my emotional dataset has been released. But feel free to post here for feedback on this.
After release means before new ideas are coming to mind. See here for "drunk" samples (just pronounced this way, i'm not drunk while recording ;-) ) https://twitter.com/ThorstenVoice/status/1386060775488430085
@monatis I'm just being curious, did you have time time listen to my emotional dataset and/or found an intended usage?
Hi @thorstenMueller, I listened to some samples and it sounded quite promising. Unfortunately I was dealing with health problems of my parents at that time so I just missed a feedback. Sorry for that. Currently working on a NLP project in Turkish, then I want to return to working with Emotional Thorsten dataset to make some experiments with the following: (1) Introducing an emotion vector to the architecture of Tacotron2 and finetuning it for Emotional TTS. (2) Training few-shot adaptation methods like AdaDurIAN. Let's see what we can build.
Hi @monatis . Thanks for your fast response and sorry to hear your parents had health problems. Hopefully things are doing well now. I justed wanted to be sure, that you know my emotional dataset is ready for "whatever". Your ideas sounds promising.
@thorstenMueller Yes they are up and running now 😃 Thanks for the reminder and of course for the great dataset you're giving away to the community 😊 By the way, I'm leading the TensorFlow Turkey community and we're holding regular online events to talk about TensorFlow things and broader machine learning. If you would like to, I want to have you as a guest in one of such events to talk about your motivation and experience in building and sharing this dataset. I'm pretty sure that it will be a great inspiration for a lot of people. If you agree, you can simply say hello at the email address on my profile to talk about the details.
After release means before new ideas are coming to mind. See here for "drunk" samples (just pronounced this way, i'm not drunk while recording ;-) ) https://twitter.com/ThorstenVoice/status/1386060775488430085
Today i've released version 2 of my emotional dataset :tada:. In addition to emotions from version 1:
It now includes:
Check details and download on my Github page https://github.com/thorstenMueller/deep-learning-german-tts/#dataset-Thorsten-emotional
@monatis: Just in case it's interesting for you.
Because there exist some interesting papers based on emotional speech datasets i've decided to go back to the microphone and record a free to use emotional german dataset.
I've prepared a german corpus:
I'll record every phrase in following emotions:
I'm no professional voice actor so quality might be not as good as some might expect.
Will keep you updated on recording process.