tencent-ailab / V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
2.26k stars 283 forks source link

character expressions #18

Closed lifeng5658 closed 5 months ago

lifeng5658 commented 5 months ago

I think you should focus on facial expressions. The current leading AI in this field, including EMO and vasa, not only focus on lip-sync animations but also have very strong facial muscle movements and rich character expressions. This can be said to contribute to 70% of the final effect. Your research has taken a new direction. ^_^

zhangjun001 commented 5 months ago

Thanks. If you look at the results carefully, the audio drive itself contains characteristics such as emotions, which can have corresponding facial muscle changes and even laryngeal and cranial movements. Therefore, this type of data-driven model can often learn emotions and other muscle expressions.
In fact, it is very intuitive to add explicit expression control and other signals. We have also done some experiments and have some results. But these changes sometimes essentially conflict with the expression. For example, a piece of audio itself contains the emotion of anger, but it is forced to express happiness.