xinwen-cs / AudioDVP

AudioDVP:Photorealistic Audio-driven Video Portraits
https://github.com/xinwen-cs/AudioDVP
299 stars 50 forks source link

how to adjust landmarks? #20

Closed zhouquan9 closed 3 years ago

zhouquan9 commented 3 years ago

Hi Xin Wen, great work and very well-organized code. Really appreciated your effort!

I noticed the predicted landmarks can be obtained here. However, if I change the position of these landmarks, what's the best way to re-render the images using the adjusted landmarks?

A little more context... I've tried a bunch of speech-to-video models with Chinese audio source and found out while the big lip movements are synced, lip moves more frequently than it should. I'm trying to reduce the jitters of predicted landmarks and make it more realistic.

xinwen-cs commented 3 years ago

For question 1, it is a little bit hard to do it because landmark->expression_parameter is one-to-many :( You need to predict a new expression parameter from adjusted landmarks. For question 2, our model is designed for English. You need to train a network from scratch using large chinese dataset.