Closed zhouquan9 closed 3 years ago
For question 1, it is a little bit hard to do it because landmark->expression_parameter is one-to-many :( You need to predict a new expression parameter from adjusted landmarks. For question 2, our model is designed for English. You need to train a network from scratch using large chinese dataset.
Hi Xin Wen, great work and very well-organized code. Really appreciated your effort!
I noticed the predicted landmarks can be obtained here. However, if I change the position of these landmarks, what's the best way to re-render the images using the adjusted landmarks?
A little more context... I've tried a bunch of speech-to-video models with Chinese audio source and found out while the big lip movements are synced, lip moves more frequently than it should. I'm trying to reduce the jitters of predicted landmarks and make it more realistic.