BirgerMoell commented 3 years ago

🌟 New model addition

Model description

Generalized End-To-End Loss for Speaker Verification implements Real time voice cloning, a way to generate a Text-To-Speech model adapted to a certain speaker with a short audio sample. The model implements the following paper. https://arxiv.org/pdf/1806.04558.pdf and the code is available on github.

https://github.com/CorentinJ/Real-Time-Voice-Cloning

Open source status

[ ] the model implementation is available: (give details) https://colab.research.google.com/drive/1SUq5RLOI0TIMkrBzMHMms01aaVNgkO7c?usp=sharing

The model can be run through Colaboratory. Here is an example of a generated voice. https://soundcloud.com/birger-mo-ll/generated-voice

[ ] the model weights are available: (give details) Here are the model weights that are used.

encoder.load_model(project_name / Path("encoder/saved_models/pretrained.pt")) synthesizer = Synthesizer(project_name / Path("synthesizer/saved_models/logs-pretrained/taco_pretrained")) vocoder.load_model(project_name / Path("vocoder/saved_models/pretrained/pretrained.pt"))

[ ] who are the authors: @CorentinJ The author is not currently working on the repo, but since it is a fairly popular repo (25.000 stars) it might be reasonable to take the time to explore how to recreate / adapt the model to work with Huggingface transformer.

BirgerMoell commented 3 years ago

@patrickvonplaten This is a suggestion but there are several models available and I think the best first step would be to look into getting a Text-To-Speech model working.

I explored the Real-Time-Voice-Cloning the other day and noticed it had several issues (since the project is no longer maintained) so it might be good to look into other speech models.

Here are some examples of repos that might be useful.

https://github.com/mozilla/TTS

https://github.com/as-ideas/ForwardTacotron

patrickvonplaten commented 3 years ago

Hey @BirgerMoell - thanks a lot for the links I will take a look soon :-)

bayartsogt-ya commented 3 years ago

@BirgerMoell Thank you for resource sharing. I also want to add TransformerTTS to the list since it makes more sense to me to have transformers involved :P

I'd love to see this addition to huggingface though

patrickvonplaten commented 3 years ago

I think it'd make a lot of sense to add FastSpeech2 to the library - happy to help with a PR if someone is interested. See: https://github.com/huggingface/transformers/pull/11135

patrickvonplaten commented 3 years ago

Also, we started integrating https://github.com/as-ideas/TransformerTTS to the model hub so that people have easier access to TensorflowTTS models :-)

https://huggingface.co/tensorspeech/tts-fastspeech2-baker-ch

vishnu-anirudh commented 2 years ago

Hello To avoid duplication, I just wanted to check if anyone is working on this or if this is still relevant. If someone is still needed for this, I will be interested to take this up.

huggingface / transformers

Text to Speech Generalized End-To-End Loss for Speaker Verification, Real Time Voice Cloning #10137

🌟 New model addition

Model description

Open source status