met4citizen / TalkingHead

Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
MIT License
217 stars 73 forks source link

demo #49

Open justinmann opened 1 week ago

justinmann commented 1 week ago

I love this project!

I am porting it to Typescript & React Native. When I get working, happy to contribute this back if you are interested.

Any advice, where I could research how to build a lipsync library for a bunch of language (chinese, korean, german, dutch...)?

I would love to show you a demo of how I am using it. https://www.linkedin.com/in/thejustinmann/

met4citizen commented 1 week ago

Thanks!

I'll stick with using browser JavaScript, but TypeScript and React Native have both come up a few times before, so sharing your work on your own repo or elsewhere would likely be of interest to many. Porting to TypeScript is pretty straightforward, but as far as I know, React Native doesn't have built-in support for WebGL rendering like web browsers do. Therefore, you would probably have to use a third-party library, and there might be some limitations.

As for the new lip-sync languages, you have two options. First, you can implement a word-to-viseme class similar to those that currently exist for English, Finnish, and Lithuanian. See README Appendix C for more detailed instructions. Alternatively, you can use Microsoft Speech SDK, which can provide visemes for several languages.

If you speak the language and the language is phonetically orthographic (like Finnish), making a new lip-sync module for the Talking Head isn't that hard. However, if that is not the case, you would need a good understanding of phonology, or - if you are lucky - there might be some existing open-source implementation to start with.

I'm always interested in seeing the TalkingHead class in action. Would it be possible for you to record a short screen capture video and share the YouTube link here? - If you would rather share the link privately, let me know and we can connect on LinkedIn.

justinmann commented 1 week ago

The react-native port has been fun :). Converting to expo-three is pretty easy, but the GLB loading was a nightmare. RN is missing some basic Blob functionality, so I had to modify the GLFLoader with a hack to get it to work.

I am using Talking Head for bots and/or humans in video calls where they turned off the webcam. In this example, I made simple language tutor. It is particular cool when you use it with cloned voices, so a person is typing their message in a video call and Talking Head talks for them using their voice.

https://youtube.com/shorts/JXSQo5Jj4qI

met4citizen commented 1 week ago

Wow, using the class with video calls is a great idea! (Your video clip didn't seem to have audio, but I got the gist) -- Looking forward to being able to switch to an autonomous bot-mode in the middle of some tedious video meeting and take a nap :)

I haven't used React Native myself, but last year I did go through some online discussions about using it with Three.js and typically found more questions than answers. It's good to hear that you have managed to make it work. - Can I ask what AI model and TTS engine you're using in your app?