huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
128.68k stars 25.52k forks source link

Text to Speech Generalized End-To-End Loss for Speaker Verification, Real Time Voice Cloning #10137

Open BirgerMoell opened 3 years ago

BirgerMoell commented 3 years ago

🌟 New model addition

Model description

Generalized End-To-End Loss for Speaker Verification implements Real time voice cloning, a way to generate a Text-To-Speech model adapted to a certain speaker with a short audio sample. The model implements the following paper. https://arxiv.org/pdf/1806.04558.pdf and the code is available on github.

https://github.com/CorentinJ/Real-Time-Voice-Cloning

Open source status

The model can be run through Colaboratory. Here is an example of a generated voice. https://soundcloud.com/birger-mo-ll/generated-voice

encoder.load_model(project_name / Path("encoder/saved_models/pretrained.pt")) synthesizer = Synthesizer(project_name / Path("synthesizer/saved_models/logs-pretrained/taco_pretrained")) vocoder.load_model(project_name / Path("vocoder/saved_models/pretrained/pretrained.pt"))

BirgerMoell commented 3 years ago

@patrickvonplaten This is a suggestion but there are several models available and I think the best first step would be to look into getting a Text-To-Speech model working.

I explored the Real-Time-Voice-Cloning the other day and noticed it had several issues (since the project is no longer maintained) so it might be good to look into other speech models.

Here are some examples of repos that might be useful.

https://github.com/mozilla/TTS

https://github.com/as-ideas/ForwardTacotron

patrickvonplaten commented 3 years ago

Hey @BirgerMoell - thanks a lot for the links I will take a look soon :-)

bayartsogt-ya commented 3 years ago

@BirgerMoell Thank you for resource sharing. I also want to add TransformerTTS to the list since it makes more sense to me to have transformers involved :P

I'd love to see this addition to huggingface though

patrickvonplaten commented 3 years ago

I think it'd make a lot of sense to add FastSpeech2 to the library - happy to help with a PR if someone is interested. See: https://github.com/huggingface/transformers/pull/11135

patrickvonplaten commented 3 years ago

Also, we started integrating https://github.com/as-ideas/TransformerTTS to the model hub so that people have easier access to TensorflowTTS models :-)

https://huggingface.co/tensorspeech/tts-fastspeech2-baker-ch

vishnu-anirudh commented 2 years ago

Hello To avoid duplication, I just wanted to check if anyone is working on this or if this is still relevant. If someone is still needed for this, I will be interested to take this up.