How did the demo vids achieve facial movements when Densepose does not contain facial information?

magic-research / magic-animate

[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

https://showlab.github.io/magicanimate/

BSD 3-Clause "New" or "Revised" License

10.12k stars 1.03k forks source link

How did the demo vids achieve facial movements when Densepose does not contain facial information? #54

Open chen-rn opened 7 months ago

chen-rn commented 7 months ago

In this demo, we can see the girl moving her mouth "lip syncing".

However, since the Densepose does not contain any facial information(it's just blobs), and the initial image only contains one reference of the face, how is it extrapolating lip sync movements?

From my personal experiments, it seems very challenging to maintain facial coherence, especially during dynamic movements.

I'd love to learn more on how those demo videos were achieved.

FurkanGozukara commented 7 months ago

i am not able to reproduce demos quality :/

chrislytras commented 7 months ago

They probably used training samples. It's overfit.