padmalcom / Real-Time-Voice-Cloning-German

German model for https://github.com/CorentinJ/Real-Time-Voice-Cloning
Other
35 stars 6 forks source link

Improve Quality of Voice #10

Closed MrRoxMCArthur closed 3 years ago

MrRoxMCArthur commented 3 years ago

Hello @padmalcom, thank you very much again for sharing your work with us :)

I've tried out to record my own voice via the demo-toolbox. Also i've tried a .wav with a voice of Arnold Schwarzenegger. Unfortunately the voice doesn't sound similar to the references.

Do you know if there is any setting I have missed or are there some further steps (e.g. more training) to get voice that sounds similar?

Best regards Rox

padmalcom commented 3 years ago

Hi Rox, yes that is actually a problem that Corentin pointed out before and when you look at his product resemble.ai you will notice that he and his team use a lot of training data recoded by the users to clone their voice. This sadly means: In general you can't clone some voices using the 5 seconds of recorded audio. Nevertheless, you can create some (~100 I guess) samples of your voice and train the synthesizer with it. I'll try that in the next days and will report back if I can improve the voice quality. Feel free to close this issue when this answers your question.

MrRoxMCArthur commented 3 years ago

Hi Padmalcom,

thank you very much for your detailed answer :). I try out to train the synthesizer with my voice.

padmalcom commented 3 years ago

Hi Rox, awesome, let me know if you could achieve any improvement. To easily record your own data sames have a look at my tool https://github.com/padmalcom/ttsdatasetcreator :)

Marcophono2 commented 3 years ago

Hi Rox, yes that is actually a problem that Corentin pointed out before and when you look at his product resemble.ai you will notice that he and his team use a lot of training data recoded by the users to clone their voice. This sadly means: In general you can't clone some voices using the 5 seconds of recorded audio. Nevertheless, you can create some (~100 I guess) samples of your voice and train the synthesizer with it. I'll try that in the next days and will report back if I can improve the voice quality.

Does the missing reporting back means that you had no success? Otherwise, or if you still didn't find the time to do so, I would like to try it with my own voice basing on your nice ttsdatacreator.

Best regards, Marc

P.S.: I enjoyed your Udemy course yesterday! :)

padmalcom commented 3 years ago

Hi Marc, sorry for the late reply. I actually had success :) I created approx. 200 one sentence recordings with my voice, removed all other training data and trained the synthesizer for some 1000 iterations (~1 hour). The results are really impressive. When I find the time I'll record another video about fine tuning for the udemy tutorial.

Marcophono @.***> schrieb am Fr., 27. Aug. 2021, 23:25:

Hi Rox, yes that is actually a problem that Corentin pointed out before and when you look at his product resemble.ai you will notice that he and his team use a lot of training data recoded by the users to clone their voice. This sadly means: In general you can't clone some voices using the 5 seconds of recorded audio. Nevertheless, you can create some (~100 I guess) samples of your voice and train the synthesizer with it. I'll try that in the next days and will report back if I can improve the voice quality.

Does the missing reporting back means that you had no success? Otherwise, or if you still didn't find the time to do so, I would like to try it with my own voice basing on your nice ttsdatacreator.

Best regards, Marc

P.S.: I enjoyed your Udemy course yesterday! :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/padmalcom/Real-Time-Voice-Cloning-German/issues/10#issuecomment-907484798, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6HIXSNXOLJDWNP6BJVH3DT677DHANCNFSM475MSWPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Marcophono2 commented 3 years ago

Wow, that sounds wonderful! I am looking forward to train it this weekend. One other question: As far as I understand this way would just train the synthesizer, basing on the single phonemes. Of course I pronounce the same word or syllable different in different contexts. For example if a certain word (or a phonem in that word) is part of a question or of a declarative sentence comes with different pronounciations. So, if I'm right, the training would find a good average to minimize the loss. What means that context-depending pronunciation are not supported. Is there a way to use NLP models to make the synthesizer understand in which text context these or those phonem pronounciations to use? Surely we would need much more spoken input but... makable I think!

Best regards from Wetter (near Dortmund) Marc