tencent-ailab / V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
2.03k stars 250 forks source link

The results obtained after running the demo are inconsistent with those shown. #23

Open xiao-keeplearning opened 1 month ago

xiao-keeplearning commented 1 month ago

I ran the demo code for scenario 2 and got talk_tys_fix_face.mp4, but the video results are not the same as shown in the readme. And it looks like my result are a little worse.

https://github.com/tencent-ailab/V-Express/assets/26853334/8d3d7212-2fc7-475a-9706-a120c1cda3db

tiankuan93 commented 1 month ago

We've adjusted the default weights for reference_attention_weight and audio_attention_weight with the goal of making mouth movements more pronounced. You can turn up reference_attention_weight to make the model maintain higher character consistency, and turn down audio_attention_weight to reduce mouth artifacts. As shown below.

python inference.py \
    --reference_image_path "./test_samples/short_case/tys/ref.jpg" \
    --audio_path "./test_samples/short_case/tys/aud.mp3" \
    --output_path "./output/short_case/talk_tys_fix_face.mp4" \
    --retarget_strategy "fix_face" \
    --num_inference_steps 25 \
    --reference_attention_weight 1.0 \
    --audio_attention_weight 1.0