tanshuai0219 / EDTalk

[ECCV 2024 Oral] EDTalk - Official PyTorch Implementation
Apache License 2.0
321 stars 28 forks source link

Result is not good for teeth only #18

Closed nitinmukesh closed 2 months ago

nitinmukesh commented 2 months ago

Teeth are all messed up. Any suggestions?

Lip motion python demo_lip_pose.py --fix_pose --source_path ./test_data/identity_source.jpg --audio_driving_path test_data/teaser.mp3 --save_path ./output/1.mp4 --face_sr

https://github.com/user-attachments/assets/22ba02ea-ae01-4fc6-ba6f-1a37ef9038e0

Head pose

https://github.com/user-attachments/assets/fdb78961-239a-49bf-9428-174c7c8704a3

Video-driven setting

nitinmukesh commented 2 months ago

How to use Video-driven setting

python demo_EDTalk_V.py --source_path ./test_data/identity_source.jpg --lip_driving_path ./test_data/mouth_source.mp4 --audio_driving_path ./test_data/mouth_source.wav --pose_driving_path ./test_data/pose_source2.mp4 --exp_driving_path ./test_data/exp_weights/angry.npy --save_path ./output/2.mp4 --face_sr

(EDTalk) C:\usable\EDTalk>python demo_EDTalk_V.py --source_path ./test_data/identity_source.jpg --lip_driving_path ./test_data/mouth_source.mp4 --audio_driving_path ./test_data/mouth_source.wav --pose_driving_path ./test_data/pose_source2.mp4 --exp_driving_path ./test_data/exp_weights/angry.npy --save_path ./output/3.mp4 --face_sr
==> loading model
==> loading data
Traceback (most recent call last):
  File "demo_EDTalk_V.py", line 163, in <module>
    demo = Demo(args)
  File "demo_EDTalk_V.py", line 77, in __init__
    self.exp_vid_target, self.fps = vid_preprocessing(args.exp_driving_path)
  File "demo_EDTalk_V.py", line 36, in vid_preprocessing
    fps = vid_dict[2]['video_fps']
KeyError: 'video_fps'
tanshuai0219 commented 2 months ago

Teeth are all messed up. Any suggestions?

Lip motion python demo_lip_pose.py --fix_pose --source_path ./test_data/identity_source.jpg --audio_driving_path test_data/teaser.mp3 --save_path ./output/1.mp4 --face_sr

1_512.mp4 Head pose

1_512.mp4 Video-driven setting

I think the problem with this case is that the face_sr module incorrectly mistook the mouth for teeth, so after face_sr this case's mouth is full of teeth.

tanshuai0219 commented 2 months ago

How to use Video-driven setting

python demo_EDTalk_V.py --source_path ./test_data/identity_source.jpg --lip_driving_path ./test_data/mouth_source.mp4 --audio_driving_path ./test_data/mouth_source.wav --pose_driving_path ./test_data/pose_source2.mp4 --exp_driving_path ./test_data/exp_weights/angry.npy --save_path ./output/2.mp4 --face_sr

(EDTalk) C:\usable\EDTalk>python demo_EDTalk_V.py --source_path ./test_data/identity_source.jpg --lip_driving_path ./test_data/mouth_source.mp4 --audio_driving_path ./test_data/mouth_source.wav --pose_driving_path ./test_data/pose_source2.mp4 --exp_driving_path ./test_data/exp_weights/angry.npy --save_path ./output/3.mp4 --face_sr
==> loading model
==> loading data
Traceback (most recent call last):
  File "demo_EDTalk_V.py", line 163, in <module>
    demo = Demo(args)
  File "demo_EDTalk_V.py", line 77, in __init__
    self.exp_vid_target, self.fps = vid_preprocessing(args.exp_driving_path)
  File "demo_EDTalk_V.py", line 36, in vid_preprocessing
    fps = vid_dict[2]['video_fps']
KeyError: 'video_fps'

Hi, you can run python demo_EDTalk_V_using_predefined_exp_weights.py --source_path ./test_data/identity_source.jpg --lip_driving_path ./test_data/mouth_source.mp4 --audio_driving_path ./test_data/mouth_source.wav --pose_driving_path ./test_data/pose_source2.mp4 --exp_type "angry" --save_path ./output/2.mp4 --face_sr

or

python demo_EDTalk_V.py --source_path ./test_data/identity_source.jpg --lip_driving_path ./test_data/mouth_source.mp4 --audio_driving_path ./test_data/mouth_source.wav --pose_driving_path ./test_data/pose_source2.mp4 --exp_driving_path ./test_data/expression_source.mp4 --save_path ./output/2.mp4 --face_sr

when run python demo_EDTalk_V.py, the exp_driving_path should be a video path.

tanshuai0219 commented 2 months ago

Teeth are all messed up. Any suggestions?

Lip motion python demo_lip_pose.py --fix_pose --source_path ./test_data/identity_source.jpg --audio_driving_path test_data/teaser.mp3 --save_path ./output/1.mp4 --face_sr

1_512.mp4 Head pose

1_512.mp4 Video-driven setting

Hi, I use new added video-driven script: python demo_lip_pose_V.py --source_path test_data/identity_source.jpg --lip_driving_path test_data/mouth_source.mp4 --pose_driving_path test_data/pose_source1.mp4 --face_sr

and the results is:

https://github.com/user-attachments/assets/912097cf-ce92-42ca-960b-c4e0906cb0b0

After face_sr:

https://github.com/user-attachments/assets/c4e1a81c-76c1-462a-b671-9c82e37e14ad

And another source person:

https://github.com/user-attachments/assets/4e630594-1dd2-47fb-b367-6be7a700c769

https://github.com/user-attachments/assets/f1a0b477-a120-47a5-b925-00af4ff09781

Welcome to try~

nitinmukesh commented 2 months ago

Thank you @tanshuai0219. I will try this now.

Please could you help with, if I have image, audio and I want the expressions like Sad, Happy, etc How could I do that. What is the purpose of NPY files in test_data\exp_weights.

For e.g. A man is sad and speaking I would want to use the image of that man, the dialogue audio and the expression whcih I understand is NPY files. What should be the command syntax.

tanshuai0219 commented 2 months ago

Thank you @tanshuai0219. I will try this now.

Please could you help with, if I have image, audio and I want the expressions like Sad, Happy, etc How could I do that. What is the purpose of NPY files in test_data\exp_weights.

For e.g. A man is sad and speaking I would want to use the image of that man, the dialogue audio and the expression whcih I understand is NPY files. What should be the command syntax.

try: python demo_EDTalk_A_using_predefined_exp_weights.py --source_path ./test_data/identity_source.jpg --audio_driving_path ./test_data/mouth_source.wav --pose_driving_path ./test_data/pose_source2.mp4 --exp_type "sad" --save_path ./output/sad.mp4 --face_sr

The results should be: https://github.com/user-attachments/assets/3b069f23-aa8b-4438-8401-345854b2e8c0

https://github.com/user-attachments/assets/d3eb156b-a523-4cf2-9d22-78d7de061bd3

The exp_type can be selected from ['angry', 'contempt', 'disgusted', 'fear', 'happy', 'sad', 'surprised']

nitinmukesh commented 2 months ago

Thank you @tanshuai0219. Is there any way to fix the teeth, I have attached both output with/-out face_sr. Expressions are very good. I am planning to release video tutorial for this tool. I am very interested and a lot other too as it provide uniqueness in terms of expressions. I did the video tutorial for AniTalker too.

Angry

https://github.com/user-attachments/assets/d2626c21-38d8-4918-89af-a10fd9219bd7

https://github.com/user-attachments/assets/f2a1348c-435d-4a1f-9a5e-b5ef0d03cb51

Contempt

https://github.com/user-attachments/assets/afbf474c-1973-4375-85e8-9b332a2694f4

https://github.com/user-attachments/assets/d20b3167-557d-4b2a-910a-3279f0df70fc

Disgusted

https://github.com/user-attachments/assets/c8a2df54-1378-4381-98c8-2bb502edbbfc

https://github.com/user-attachments/assets/688935a2-194c-4fbe-8bea-ed9051b1d0f7

Fear

https://github.com/user-attachments/assets/2b85f055-72f3-4af8-b801-adad175c3473

https://github.com/user-attachments/assets/f6f85957-8f87-4b60-9016-e5989dd95a47

Happy

https://github.com/user-attachments/assets/21b9274d-5963-4c8f-90bb-76ff12d1db08

https://github.com/user-attachments/assets/82f7c2c6-874c-417d-9e60-ec569c57386d

Sad

https://github.com/user-attachments/assets/3c8c521a-d69c-4f5a-8e76-3fb5173a2764

https://github.com/user-attachments/assets/e6037e6e-de2a-4870-9798-0e3cece6d9ac

Surprised

tanshuai0219 commented 2 months ago

Thank you @tanshuai0219. Is there any way to fix the teeth, I have attached both output with/-out face_sr. Expressions are very good. I am planning to release video tutorial for this tool. I am very interested and a lot other too as it provide uniqueness in terms of expressions. I did the video tutorial for AniTalker too.

Angry

angry.mp4 angry_512.mp4 Contempt

contempt.mp4 contempt_512.mp4 Disgusted

disgusted.mp4 disgusted_512.mp4 Fear

fear.mp4 fear_512.mp4 Happy

happy.mp4 happy_512.mp4 Sad

sad.mp4 sad_512.mp4 Surprised

Since we only used a very small amount of data to train the model, the clarity of the teeth is slightly worse. face_sr is currently the simpler way we came up with to solve this problem. When I have time I will use more data and improve the model. Thanks for your interest in EDTalk and thanks to publicize EDTalk .

tanshuai0219 commented 2 months ago

Thank you @tanshuai0219. Is there any way to fix the teeth, I have attached both output with/-out face_sr. Expressions are very good. I am planning to release video tutorial for this tool. I am very interested and a lot other too as it provide uniqueness in terms of expressions. I did the video tutorial for AniTalker too.

Angry

angry.mp4 angry_512.mp4 Contempt

contempt.mp4 contempt_512.mp4 Disgusted

disgusted.mp4 disgusted_512.mp4 Fear

fear.mp4 fear_512.mp4 Happy

happy.mp4 happy_512.mp4 Sad

sad.mp4 sad_512.mp4 Surprised

Hi, I put your generated cases in the README, thanks for providing interesting cases. If you generate other interesting cases, please contact me. I plan to rebuild the project (https://tanshuai0219.github.io/EDTalk/) and need more examples. If the cases are presented in the project, I will point out the your contribution in the project~

nitinmukesh commented 2 months ago

@tanshuai0219

Sure I will create more data and examples of each type of inference.

Just busy creating Video tutorial, once I post that will work on more thorough testing.