met4citizen / TalkingHead

Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
MIT License
296 stars 95 forks source link

Can I use OpenAI TTS? #16

Closed JPhilipp closed 6 months ago

JPhilipp commented 6 months ago

Fantastic project, thank you!

Is it possible to use OpenAI's Text-to-Speech too? They have a superb voice feel and intonation and can move fluently across different languages (without even needing to be set to a specific language).

Thanks!

met4citizen commented 6 months ago

Thank you!

It is possible, but currently the problem with OpenAI's TTS is that their Audio Speech endpoint can't provide word timestamps, which are needed for accurate lip-sync. (Their Transcription endpoint can provide them, so fingers crossed that at some point they add that capability to speech synthesis, too.)

JPhilipp commented 6 months ago

Cheers!

What if you allowed us to pay extra to get the word timestamps via their Transcription API and have it all work behind the scenes?

(I have made a choose-your-own-adventure Twitch app which already uses their voices, so I would like for the virtual host to match the voice quality.)

met4citizen commented 6 months ago

Yes, that is possible. Not the most elegant or cost-effective solution, but possible. By using OpenAI's Transcription API you can make the TalkingHead avatar speak/lip-sync almost any audio file. There is already a small code example for that, see ./examples/mp3.html. Just add in your own OpenAI API key.

JPhilipp commented 6 months ago

Excellent, thanks!