Closed JPhilipp closed 7 months ago
Thank you!
It is possible, but currently the problem with OpenAI's TTS is that their Audio Speech endpoint can't provide word timestamps, which are needed for accurate lip-sync. (Their Transcription endpoint can provide them, so fingers crossed that at some point they add that capability to speech synthesis, too.)
Cheers!
What if you allowed us to pay extra to get the word timestamps via their Transcription API and have it all work behind the scenes?
(I have made a choose-your-own-adventure Twitch app which already uses their voices, so I would like for the virtual host to match the voice quality.)
Yes, that is possible. Not the most elegant or cost-effective solution, but possible. By using OpenAI's Transcription API you can make the TalkingHead avatar speak/lip-sync almost any audio file. There is already a small code example for that, see ./examples/mp3.html. Just add in your own OpenAI API key.
Excellent, thanks!
Fantastic project, thank you!
Is it possible to use OpenAI's Text-to-Speech too? They have a superb voice feel and intonation and can move fluently across different languages (without even needing to be set to a specific language).
Thanks!