Open guoyilin opened 5 months ago
If you set retarget_strategy as "no_retarget", it is highly recommended to use reference_attention_weight >2.
python inference.py \
--reference_image_path "./test_samples/short_case/AOC/ref.jpg" \
--audio_path "./test_samples/short_case/AOC/v_exprss_intro_chattts.mp3" \
--kps_path "./test_samples/short_case/AOC/AOC_raw_kps.pth" \
--output_path "./output/short_case/talk_AOC_raw_kps_chattts_no_retarget.mp4" \
--retarget_strategy "fix_face" \
--num_inference_steps 25 \
--reference_attention_weight 1.0 \
--audio_attention_weight 3.0 \
--save_gpu_memory
Generally, if the front view reference and kps are used, naive_retarget works better. note, audio_attention_weight sets to 1.0.
great job. I test the mouth shape when i input a ref image , the ref image has smile mouth shape, after the model inference, the output video keep the smile mouth when speak(the mouth shape should change according to the audio ). what's the reason? is it the MEAD data problem(a video always only one expression)?