关于我的测试结果，不是很理想？

zhanghongyong123456 commented 1 month ago

我的驱动视频：预处理：python scripts/extract_kps_sequence_and_audio.py \ --video_path "./test_samples/short_case/10/gt.mp4" \ --kps_sequence_save_path "./test_samples/short_case/10/kps.pth" \ --audio_save_path "./test_samples/short_case/10/aud.mp3"

https://github.com/tencent-ailab/V-Express/assets/48466610/60db96fb-1841-4de3-acf2-430407236a4a

我的参考图像：（截图 512x512） 003

我的结果：运行脚本： python inference.py \ --reference_image_path "./test_samples/short_case/tys/ref.jpg" \ --audio_path "./test_samples/short_case/tys/aud.mp3" \ --kps_path "./test_samples/short_case/tys/kps.pth" \ --output_path "./output/short_case/talk_tys_fix_face.mp4" \ --retarget_strategy "fix_face"

https://github.com/tencent-ailab/V-Express/assets/48466610/b4ce5263-9eb1-4478-b126-36859346912f

我不确定我哪里有问题，希望指点一下

FurkanGozukara commented 1 month ago

i planned to make a gradio app for this but this result looks very bad

tiankuan93 commented 1 month ago

Our model is trained using English audio, and our audio feature extractor is also trained in English, so our model will perform more consistently on English audio for now. Other languages may yield some reasonable results, but it will require some experimentation with the parameters.
For the mode of _"fixface", we provide parameters to adjust the effect of the audio. We also commit the default parameters in the new commit.

We get the same results if we use the reference_attention_weight=1.0 and audio_attention_weight=1.0 parameters.
We get results bellow if we use the reference_attention_weight=0.95 and audio_attention_weight=3.0 parameters.

https://github.com/tencent-ailab/V-Express/assets/19601425/d88f9a9b-cc06-4476-b997-82fcb88e57d4

If we crop the reference image more properly and use English audio, we get the following results.

https://github.com/tencent-ailab/V-Express/assets/19601425/79e4ddfe-e9a1-4ba1-b4ce-294023a0f1ab

https://github.com/tencent-ailab/V-Express/assets/19601425/4d4a04e8-d412-40a8-a370-1fc603addf6d

FurkanGozukara commented 1 month ago

@tiankuan93 thanks huge info do you plan to make a gradio demo app? or we have to make ourselves

tiankuan93 commented 1 month ago

@tiankuan93 thanks huge info do you plan to make a gradio demo app? or we have to make ourselves

@FurkanGozukara We have no plans to implement a gradio demo app for now, thank you for your interest.

FurkanGozukara commented 1 month ago

@tiankuan93 thanks huge info do you plan to make a gradio demo app? or we have to make ourselves

@FurkanGozukara We have no plans to implement a gradio demo app for now, thank you for your interest.

Thanks then hopefully I will do myself and publish

I hope you don't change much with consistency Lora so I can implement that too

tiankuan93 commented 1 month ago

@tiankuan93 thanks huge info do you plan to make a gradio demo app? or we have to make ourselves

@FurkanGozukara We have no plans to implement a gradio demo app for now, thank you for your interest.

Thanks then hopefully I will do myself and publish

I hope you don't change much with consistency Lora so I can implement that too

Consistency Lora only reduces the number of steps for inference and doesn't change much else.

cantonalex commented 1 month ago

@zhanghongyong123456 what retarget strategy did you use for your first video? how did you get the output video uncropped to the reference face

EDIT: oh thats your source video. bummer..

boboji21 commented 1 month ago

牛逼A

tencent-ailab / V-Express

关于我的测试结果，不是很理想？ #6