yoyo-nb / Thin-Plate-Spline-Motion-Model

[CVPR 2022] Thin-Plate Spline Motion Model for Image Animation.
MIT License
3.38k stars 551 forks source link

Full frame HD talking head with driving audio #58

Open mvoodarla opened 1 year ago

mvoodarla commented 1 year ago

Hey @yoyo-nb!

This issue isn't really an issue and more some insight into my experiments of how one might be able to add the talking head generation back to the original frame they were trying to animate. Just thought it'd be cool to share :) This would make something like the following possible rather than just a small driving video focused on the face. As you can see, if you squint, you can kind of tell that we added the crop back to the original frame but it isn't super noticeable.

https://user-images.githubusercontent.com/11367688/225168590-78099062-f1fe-437b-8faa-7fd02960a972.mp4

The core change I made was the way one does face alignment to be based purely on rotating the image until the eyes are level. I didn't recognize any quality degradation to your approach based on doing this but what this allows is for us to "unwarp" the original image so it is again level with the rest of the original frame.

On top of that, I also made it so that one could simply supply a driving audio clip along with a driving video clip such that we first run Wav2Lip on that, and then use the lip synced video to drive the avatar. This allows for us to make the avatar say whatever we want, even if we don't have a driving video that's saying it.

Here are some more examples:

https://user-images.githubusercontent.com/11367688/225170178-b0d937de-9373-415b-87ff-43f4fae669f1.mp4

https://user-images.githubusercontent.com/11367688/225170184-0eca36e9-e291-44cb-896c-9cf738bdfff2.mp4

My colleague wrote a bit about it here: https://www.sievedata.com/blog/realistic-ai-avatars Code to deploy on Sieve or to deploy yourself: https://github.com/sieve-community/examples/tree/main/talking_head_avatars

If you think it's worthwhile I'd be happy to implement this into your repo though it might be tough given the moving parts with Wav2Lip as well.

In case you are wondering who I am, I'm Mokshith and I work at Sieve where we make it easy for people to build and run video AI pipelines. We realized that to make a lot of research useful, you end up having to combine it with many moving parts and even then -- it's really slow to run. We're just working on cool projects built on top of Sieve for fun! You can also just clone and run what I described above yourself by clicking here and signing up for free :)

fjesikfjdskl commented 11 months ago

Hello, is the source code open source?

ak01user commented 11 months ago

superman