met4citizen / TalkingHead

Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
MIT License
349 stars 107 forks source link

a general control of stillness #31

Closed branaway closed 6 months ago

branaway commented 7 months ago

Thanks for the fantastic project! Lots of details are there, which requires lots of tedious work. The code is a lot hard to digest though. For an example, Id' like to have a global control of how active/still my model's pose look like when in upper/head view. The model would change and move too much up close in happy mood. I'd like her to move less and move in less amplitude. Any suggestions?

met4citizen commented 7 months ago

Thank you. This is a good idea for a new feature.

Adding a new attribute to the class that operates as a coefficient for how often the pose is changed would be pretty straightforward. However, limiting the range of movement between poses is much trickier. There would probably have to be an inverse kinematics (IK) algorithm either running in the background or, alternatively, able to modify all the preset poses so that there would be less movement in the upper body. An alternative approach might be to fake this in some way, for example, by fixing the position of the avatar's hip, or letting the camera track the head, but I'm not sure if that would look natural.

All in all, this will take some time to plan and implement.

However, there is already a way to do this, but it requires a bit more work than just adjusting a single value. That is, you can design your own poses and then add a new custom mood (in which you can also adjust how frequently the poses are changed). - I will give a brief outline here for those interested:

In the TalkingHead class, the avatar's movements are based on two data structures: head.poseTemplates and head.animMoods. The former describes all the in-built poses by defining the hip position and body rotations. The latter describes, for different moods, what poses to use, how often they change, how the head moves, when the eyes blink, and so on.

Here is a way how you can add a new pose to the system:


head.poseTemplates["custom-pose-1"] = {
  standing: true, sitting: false, bend: false, kneeling: false, lying: false,
  props: head.propsToThreeObjects({
    'Hips.position':{x:0, y:0.989, z:0.001}, 'Hips.rotation':{x:0.047, y:0.007, z:-0.007}, 'Spine.rotation':{x:-0.143, y:-0.007, z:0.005}, 'Spine1.rotation':{x:-0.043, y:-0.014, z:0.012}, 'Spine2.rotation':{x:0.072, y:-0.013, z:0.013}, 'Neck.rotation':{x:0.048, y:-0.003, z:0.012}, 'Head.rotation':{x:0.05, y:-0.02, z:-0.017}, 'LeftShoulder.rotation':{x:1.62, y:-0.166, z:-1.605}, 'LeftArm.rotation':{x:1.275, y:0.544, z:-0.092}, 'LeftForeArm.rotation':{x:0, y:0, z:0.302}, 'LeftHand.rotation':{x:-0.225, y:-0.154, z:0.11}, 'LeftHandThumb1.rotation':{x:0.435, y:-0.044, z:0.457}, 'LeftHandThumb2.rotation':{x:-0.028, y:0.002, z:-0.246}, 'LeftHandThumb3.rotation':{x:-0.236, y:-0.025, z:0.113}, 'LeftHandIndex1.rotation':{x:0.218, y:0.008, z:-0.081}, 'LeftHandIndex2.rotation':{x:0.165, y:-0.001, z:-0.017}, 'LeftHandIndex3.rotation':{x:0.165, y:-0.001, z:-0.017}, 'LeftHandMiddle1.rotation':{x:0.235, y:-0.011, z:-0.065}, 'LeftHandMiddle2.rotation':{x:0.182, y:-0.002, z:-0.019}, 'LeftHandMiddle3.rotation':{x:0.182, y:-0.002, z:-0.019}, 'LeftHandRing1.rotation':{x:0.316, y:-0.017, z:0.008}, 'LeftHandRing2.rotation':{x:0.253, y:-0.003, z:-0.026}, 'LeftHandRing3.rotation':{x:0.255, y:-0.003, z:-0.026}, 'LeftHandPinky1.rotation':{x:0.336, y:-0.062, z:0.088}, 'LeftHandPinky2.rotation':{x:0.276, y:-0.004, z:-0.028}, 'LeftHandPinky3.rotation':{x:0.276, y:-0.004, z:-0.028}, 'RightShoulder.rotation':{x:1.615, y:0.064, z:1.53}, 'RightArm.rotation':{x:1.313, y:-0.424, z:0.131}, 'RightForeArm.rotation':{x:0, y:0, z:-0.317}, 'RightHand.rotation':{x:-0.158, y:-0.639, z:-0.196}, 'RightHandThumb1.rotation':{x:0.44, y:0.048, z:-0.549}, 'RightHandThumb2.rotation':{x:-0.056, y:-0.008, z:0.274}, 'RightHandThumb3.rotation':{x:-0.258, y:0.031, z:-0.095}, 'RightHandIndex1.rotation':{x:0.169, y:-0.011, z:0.105}, 'RightHandIndex2.rotation':{x:0.134, y:0.001, z:0.011}, 'RightHandIndex3.rotation':{x:0.134, y:0.001, z:0.011}, 'RightHandMiddle1.rotation':{x:0.288, y:0.014, z:0.092}, 'RightHandMiddle2.rotation':{x:0.248, y:0.003, z:0.02}, 'RightHandMiddle3.rotation':{x:0.249, y:0.003, z:0.02}, 'RightHandRing1.rotation':{x:0.369, y:0.019, z:0.006}, 'RightHandRing2.rotation':{x:0.321, y:0.004, z:0.026}, 'RightHandRing3.rotation':{x:0.323, y:0.004, z:0.026}, 'RightHandPinky1.rotation':{x:0.468, y:0.085, z:-0.03}, 'RightHandPinky2.rotation':{x:0.427, y:0.007, z:0.034}, 'RightHandPinky3.rotation':{x:0.142, y:0.001, z:0.012}, 'LeftUpLeg.rotation':{x:-0.077, y:-0.058, z:3.126}, 'LeftLeg.rotation':{x:-0.252, y:0.001, z:-0.018}, 'LeftFoot.rotation':{x:1.315, y:-0.064, z:0.315}, 'LeftToeBase.rotation':{x:0.577, y:-0.07, z:-0.009}, 'RightUpLeg.rotation':{x:-0.083, y:-0.032, z:3.124}, 'RightLeg.rotation':{x:-0.272, y:-0.003, z:0.021}, 'RightFoot.rotation':{x:1.342, y:0.076, z:-0.222}, 'RightToeBase.rotation':{x:0.44, y:0.069, z:0.016}
  })
};

The syntax for a new pose is pretty straightforward. Hip position is defined as an (x, y, z) coordinate in meters. Rotations are Euler XYZ rotations in radians. In each pose, the avatar should have its weight on the left foot, if any. The class automatically mirrors it for the other side. Setting the properties standing, sitting, etc., helps the class make the transitions between different poses in proper steps, if needed.

Here is a way to add a new custom mood:


head.animMoods["custom-mood-1"] = {
  baseline: { eyesLookDown: 0.1 },
  speech: { deltaRate: 0, deltaPitch: 0, deltaVolume: 0 },
  anims: [
    { name: 'breathing', delay: 1500, dt: [ 1200,500,1000 ], vs: { chestInhale: [0.5,0.5,0] } },
    { name: 'pose', alt: [
      { p: 0.2, delay: [5000,20000], vs: { pose: ['side'] } },
      { p: 0.2, delay: [5000,20000], vs: { pose: ['hip'] },
        'M': { delay: [5000,20000], vs: { pose: ['wide'] } }
      },
      { delay: [5000,20000], vs: { pose: ['custom-pose-1'] } }
    ]},
    { name: 'head',
      idle: { delay: [0,1000], dt: [ [200,5000] ], vs: { headRotateX: [[-0.04,0.10]], headRotateY: [[-0.3,0.3]], headRotateZ: [[-0.08,0.08]] } },
      talking: { dt: [ [0,1000,0] ], vs: { headRotateX: [[-0.05,0.15,1,2]], headRotateY: [[-0.1,0.1]], headRotateZ: [[-0.1,0.1]] } }
    },
    { name: 'eyes', delay: [200,5000], dt: [ [100,500],[100,5000,2] ], vs: { eyesRotateY: [[-0.6,0.6]], eyesRotateX: [[-0.2,0.6]] } },
    { name: 'blink', delay: [1000,8000,1,2], dt: [50,[100,300],100], vs: { eyeBlinkLeft: [1,1,0], eyeBlinkRight: [1,1,0] } },
    { name: 'mouth', delay: [1000,5000], dt: [ [100,500],[100,5000,2] ], vs : { mouthRollLower: [[0,0.3,2]], mouthRollUpper: [[0,0.3,2]], mouthStretchLeft: [[0,0.3]], mouthStretchRight: [[0,0.3]], mouthPucker: [[0,0.3]] } },
    { name: 'misc', delay: [100,5000], dt: [ [100,500],[100,5000,2] ], vs : { eyeSquintLeft: [[0,0.3,3]], eyeSquintRight: [[0,0.3,3]], browInnerUp: [[0,0.3]], browOuterUpLeft: [[0,0.3]], browOuterUpRight: [[0,0.3]] } }
  ]
};
head.setMood("custom-mood-1");

The syntax here is more complex. If you want to change what poses are used and how often they get changed, you should look at the object with the name "pose". The class first iterates through the nested hierarchy of objects by following names that match the current state (idle, talking), body form (M, F), current view (full, upper, mid, head), and/or probabilities (alt). On each leaf object, there is a command pose that sets the new pose. The property delay determines how long that pose is held in milliseconds. If the delay value is an array, it defines a range for a uniform/Gaussian random value (Gaussian approximated using CLT).

You can look for the source code for more examples of poses and moods and use them as templates.

branaway commented 7 months ago

wow, thanks for the detail explanations.

1) re: "able to modify all the preset poses so that there would be less movement in the upper body." Is there a algorithmic way of making the preset poses to move in am amplitude controlled by a coefficient? would sleeping the quaternions of all the joints towards all-zeros make the poses closer to the still position? 1) how to: Adding a new attribute to the class that operates as a coefficient for how often the pose is changed would be pretty straightforward. 2) To have the camera to follow the face in every animation frame seems reachable with my three.js skill. I'll give it a shot.

branaway commented 7 months ago

Hi, can you explain what all the numbers mean in the following line, particularly the dt property and the array of array of numbers for morphtarget eyeSquintLeft in this case:

{ name: 'misc', delay: [100,5000], dt: [ [100,500],[100,5000,2] ], vs : { eyeSquintLeft: [[0,0.3,3]], ...} }

met4citizen commented 7 months ago

Would sleeping the quaternions of all the joints towards all-zeros make the poses closer to the still position?

I don't know. It doesn't really work for the arms, but if you only limit the rotations of the legs, hips, spine, and neck, it might work as long as the avatar is in a standing position.

How to: Adding a new attribute to the class that operates as a coefficient for how often the pose is changed would be pretty straightforward.

The actual delay values are calculated based on the mood animation templates in the method animFactory. That would be the place where to scale the delay when the animation template is the right one (t.name === 'pose'). However, whether you want to do this or not depends whether by "stillness" you want to limit the movement not only in space, but also in time.

met4citizen commented 7 months ago

Hi, can you explain what all the numbers mean in the following line, particularly the dt property and the array of array of numbers for morphtarget eyeSquintLeft in this case:

{ name: 'misc', delay: [100,5000], dt: [ [100,500],[100,5000,2] ], vs : { eyeSquintLeft: [[0,0.3,3]], ...} }

The object as a whole is a template that represents an animation loop named 'misc'. Based on this template, the next animation in the loop will be created by using the animFactory.

delay is the delay between the animations in the loop specified in milliseconds. Here the value is an array, so the actual value will be chosen randomly each time between 100 ms and 5 seconds. To be more specific, the value is the mean of random samples from a uniform distribution between 100-5000 ms.

dt array defines the timeline of the animation. Here the array has two elements, which means that the animation has been divided into two parts. The duration of the first part is 100-500 ms, and the duration of the second part is 100-5000 ms. The third parameter 2 skews the latter value towards the lower end of the distribution (see the class method gaussianRandom for more information).

vs object defines the target values for the shape keys for each part. Typically, the shape keys have values 0-1. Here, the eyeSquintLeft numbers mean that during the first part of the animation, the shape key value eases from its present value to the value 0-0.3 (skewed towards the lower end). Typically, there would be a second value for the second part of the animation, but since the value is missing, the value stays the same during the second part. Once the animation has ended, the shape key will automatically return to its baseline value, which is typically 0.

met4citizen commented 7 months ago

There is now a new option modelMovementFactor that limits the avatar's upper body movement when standing. It works by rotating the hip, legs, spine, and neck towards the "straight" pose. The value range is [0, 1]. A lower value means less movement relative to the "straight" pose.

You can set the option when creating the TalkingHead instance and/or change it later by calling head.opt.modelMovementFactor = 0.3;.

I think it works pretty well, but let me know what you think.

branaway commented 7 months ago

There is now a new option modelMovementFactor that limits the avatar's upper body movement when standing. It works by rotating the hip, legs, spine, and neck towards the "straight" pose. The value range is [0, 1]. A lower value means less movement relative to the "straight" pose.

You can set the option when creating the TalkingHead instance and/or change it later by calling head.opt.modelMovementFactor = 0.3;.

I think it works pretty well, but let me know what you think.

Well, as you said, the body pose is a combination of all body parts. Changing the torso only would lead to expected broken posture, as shown below:

Screen Shot 2024-04-24 at 07 53 48
met4citizen commented 7 months ago

Yes, it's not perfect. I tried rotating the arms too, but the hands/arms ended up in mid-air at an unnatural angle. In my own test setup, I also have some poses where the hand is raised or pointing at something, and there the effect was very noticeable and unwanted.

A good IK algorithm might fix these issues, but making it good enough to cover such details would require a lot of effort and expertise. Especially since the avatar can be changed, so its body form and proportions are not fixed.

However, I think the main use of this feature is, like you said, when you have a close-up view (upper/head) and want to keep it in the middle of the view. In those cases, the hands are typically not shown, so they can be ignored.

If there is a need for more control over details, the best option is to create custom poses and moods.