Open xiao-keeplearning opened 1 month ago
We've adjusted the default weights for reference_attention_weight
and audio_attention_weight
with the goal of making mouth movements more pronounced. You can turn up reference_attention_weight
to make the model maintain higher character consistency, and turn down audio_attention_weight
to reduce mouth artifacts. As shown below.
python inference.py \
--reference_image_path "./test_samples/short_case/tys/ref.jpg" \
--audio_path "./test_samples/short_case/tys/aud.mp3" \
--output_path "./output/short_case/talk_tys_fix_face.mp4" \
--retarget_strategy "fix_face" \
--num_inference_steps 25 \
--reference_attention_weight 1.0 \
--audio_attention_weight 1.0
I ran the demo code for scenario 2 and got talk_tys_fix_face.mp4, but the video results are not the same as shown in the readme. And it looks like my result are a little worse.
https://github.com/tencent-ailab/V-Express/assets/26853334/8d3d7212-2fc7-475a-9706-a120c1cda3db