met4citizen / TalkingHead

Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
MIT License
296 stars 95 forks source link

Visemes Value Problem #58

Closed mkaanztrk closed 1 month ago

mkaanztrk commented 1 month ago

Hello,

First of all amazing project. Thank you for sharing. I have a question regarding visemes, mouth movements. As i understand, RPM characters have blendshapes which give us mouth movements(visemes). But I couldn't find how to set their value. I converted project from three.js to babylon.js and doing this matches manually.

For example for 'aa' viseme, I set 'mouthOpen': 1.0, and 'jawOpen': 0.7 or for 'U' viseme, I set 'mouthPucker': 0.8, 'mouthPressLeft': 0.4

But these are completely random values and i am not sure how can I set correct values for the visemes. I couldn't find any reference in your code. From my opinion, for example if i say "Hello, my name is John" mouth should play corresponding visemes. But those visemes need to be matched with the correct values in the first place. I am not sure what I am missing. I hope i made my point. Thank you.

met4citizen commented 1 month ago

Hi,

If you download an RPM avatar with the Oculus+Visemes parameter as instructed in Appendix A, your avatar will have a blendshape for each viseme. For example, for the viseme 'aa', there will be a blendshape named viseme_aa that you can control. You don't need to construct these blendshapes from other blendshapes.

The purpose of lip-sync modules, such as lipsync-en.mjs, is to convert words into sequences of visemes. The TalkingHead class then synchronizes these visemes with the audio and controls the mouth and lip movements by adjusting the corresponding viseme blendshapes at the appropriate times.

I hope this helps.

mkaanztrk commented 1 month ago

Hi again thank you for the response. In the babylon.js editor there is a slider for each morph target which allows me to set value from 0 to 1. My main confusion is how would i know correct values for these morph targets. For example for "aa" viseme, which morph target i should trigger and what is the correct amount of influences of that morph target? I am sharing couple of screenshots. I mean for "aa" viseme there should be some influence on morph target. Screenshot which shows code is from converted chatgpt code. Thank you again. Ekran görüntüsü 2024-08-15 030600 Ekran görüntüsü 2024-08-15 030648

met4citizen commented 1 month ago

For the viseme 'aa', you should adjust the value of the morph target viseme_aa, and similarly for other visemes.

The code you shared seems to be an attempt to mimic visemes using ARKit blendshapes, but that’s unnecessary because RPM models have Oculus viseme blendshapes.

The value for the morph target viseme_aa should smoothly transition from 0 to a peak value, either linearly or with an easing function like a sigmoid, before returning to 0. The ideal transition times, overlap with other visemes, easing function, maximum value, and similar parameters don’t have definitive answers. Start with an educated guess, then test, refine, and iterate until the result appears natural.

mkaanztrk commented 1 month ago

Yes, now i realized how to set them. thank you so much :)
One more thing, is there any best practice or an example for optimum values? For the mouth visemes. For example if I set viseme_E with full value, it looks very ugly. I don't want to influence values in the wrong way.

Thank you again for your time.

met4citizen commented 1 month ago

I'm not aware of any best practice or optimal values. As a typical peak value, the TalkingHead class currently uses 0.9 for the visemes 'PP' and 'FF', and 0.6 for the rest, and for the easing function, it uses a sigmoid (see for example the sigmoidFactory method in talkinghead.mjs).

mkaanztrk commented 1 month ago

Okay, its all clear to me now. Thank you again for replies and everything.