Using elevenlabs even if elevenlabs API doesn't provide visemes per frames?

met4citizen / TalkingHead

Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.

MIT License

350 stars 108 forks source link

The Ready Player Me avatars come with Oculus viseme blendshapes, and the TalkingHead project includes language-specific lip-sync modules that can convert words into sequences of Oculus visemes. For more information about the lip-sync modules, refer to Appendix C in the README. You can also check out ./modules/lipsync-en.mjs as an example.

And yes, you are right, ElevenLabs provides timestamps that are used to synchronize these viseme sequences with the audio.

Note that the TalkingHead class also accepts visemes and viseme timestamps. So, when using the Microsoft Azure Speech SDK, you don't need to rely on the built-in lip-sync modules. Although Azure uses a slightly different viseme standard, mapping their viseme IDs to the Oculus standard, as shown in the project's test app, is straightforward.

met4citizen / TalkingHead

Using elevenlabs even if elevenlabs API doesn't provide visemes per frames? #59