liutaocode / DiffDub

[ICASSP 2024] DiffDub: Person-generic visual dubbing using inpainting renderer with diffusion auto-encoder
https://liutaocode.github.io/DiffDub/
Apache License 2.0
39 stars 4 forks source link

How to increase face likeness? #4

Closed harisreedhar closed 3 months ago

harisreedhar commented 3 months ago

Thanks for the wonderful work. I tried both one shot & few shot approach but the likeness of face is bit off. Any tips to improve likeness?

https://github.com/liutaocode/DiffDub/assets/46858047/60ff1146-e433-44a6-8a87-4a08af0015c5

liutaocode commented 3 months ago

Thank you for your question.

We also encountered similar issues during testing, especially with out-of-distribution data, where there were some changes in the appearance of individuals. This is likely due to insufficient data volume. Our model was trained using HDTF, with a training set of over 200 people, which is still relatively small, especially for diffusion models with scaling capabilities (200+ people vs. 1B model).

I suggest using datasets with more identities, such as VoxCeleb2, to retrain the first stage. The goal of the first stage should be to perfectly restore the conditioned mouth as much as possible. Then, retrain the second stage.

liutaocode commented 3 months ago

Additionally, I thought of another possible solution: instead of predicting noise during the diffusion training process, we predict the original image. On this basis, we add a loss term, which is the face recognition loss. The purpose of this is to ensure that the predicted image and the original image maintain as much identity consistency as possible. However, compared to the dataset, I still think the dataset issue is more significant.

harisreedhar commented 3 months ago

Thanks