The results obtained after running the demo are inconsistent with those shown.

tencent-ailab / V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.

2.03k stars 250 forks source link

We've adjusted the default weights for reference_attention_weight and audio_attention_weight with the goal of making mouth movements more pronounced. You can turn up reference_attention_weight to make the model maintain higher character consistency, and turn down audio_attention_weight to reduce mouth artifacts. As shown below.

python inference.py \
    --reference_image_path "./test_samples/short_case/tys/ref.jpg" \
    --audio_path "./test_samples/short_case/tys/aud.mp3" \
    --output_path "./output/short_case/talk_tys_fix_face.mp4" \
    --retarget_strategy "fix_face" \
    --num_inference_steps 25 \
    --reference_attention_weight 1.0 \
    --audio_attention_weight 1.0

tencent-ailab / V-Express

The results obtained after running the demo are inconsistent with those shown. #23