tencent-ailab / V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
2.26k stars 281 forks source link

the video mouth shape is the same with the reference? #29

Open guoyilin opened 5 months ago

guoyilin commented 5 months ago

great job. I test the mouth shape when i input a ref image , the ref image has smile mouth shape, after the model inference, the output video keep the smile mouth when speak(the mouth shape should change according to the audio ). what's the reason? is it the MEAD data problem(a video always only one expression)?

zhangjun001 commented 5 months ago

If you set retarget_strategy as "no_retarget", it is highly recommended to use reference_attention_weight >2.
python inference.py \ --reference_image_path "./test_samples/short_case/AOC/ref.jpg" \ --audio_path "./test_samples/short_case/AOC/v_exprss_intro_chattts.mp3" \ --kps_path "./test_samples/short_case/AOC/AOC_raw_kps.pth" \ --output_path "./output/short_case/talk_AOC_raw_kps_chattts_no_retarget.mp4" \ --retarget_strategy "fix_face" \ --num_inference_steps 25 \ --reference_attention_weight 1.0 \ --audio_attention_weight 3.0 \ --save_gpu_memory

Generally, if the front view reference and kps are used, naive_retarget works better. note, audio_attention_weight sets to 1.0.