tencent-ailab / V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
2.03k stars 250 forks source link

the video mouth shape is the same with the reference? #29

Open guoyilin opened 4 weeks ago

guoyilin commented 4 weeks ago

great job. I test the mouth shape when i input a ref image , the ref image has smile mouth shape, after the model inference, the output video keep the smile mouth when speak(the mouth shape should change according to the audio ). what's the reason? is it the MEAD data problem(a video always only one expression)?

zhangjun001 commented 2 weeks ago

If you set retarget_strategy as "no_retarget", it is highly recommended to use reference_attention_weight >2.
python inference.py \ --reference_image_path "./test_samples/short_case/AOC/ref.jpg" \ --audio_path "./test_samples/short_case/AOC/v_exprss_intro_chattts.mp3" \ --kps_path "./test_samples/short_case/AOC/AOC_raw_kps.pth" \ --output_path "./output/short_case/talk_AOC_raw_kps_chattts_no_retarget.mp4" \ --retarget_strategy "fix_face" \ --num_inference_steps 25 \ --reference_attention_weight 1.0 \ --audio_attention_weight 3.0 \ --save_gpu_memory

Generally, if the front view reference and kps are used, naive_retarget works better. note, audio_attention_weight sets to 1.0.