yzhou359 / MakeItTalk

Other
979 stars 218 forks source link

Is there any scope for making a lightweight model? #40

Open manupatel007 opened 3 years ago

manupatel007 commented 3 years ago

Hey! it is an awesome work on animating faces according to text. I wanted to know, if the image(just a sketch) is fixed and the audio received is varying, can we make a custom lightweight model like mobilenet, which can generate Generate input data for inference and Audio-to-Landmarks prediction using the browser's webgl only, in real time.

yzhou359 commented 3 years ago

That's a good idea. For the audio-to-landmark part, it's already a lightweight model which can run in real time. The time consuming parts are image warping and translation network. You can try to replace the residual blocks in current image translation network by the mobilenet structure to increase the speed.

luantunez commented 3 years ago

Great suggestion! It would be awesome to use this work in real time! Would this approach imply a retraining or is there a way to do it with the available checkpoints? Thank you for sharing!

yzhou359 commented 3 years ago

Great suggestion! It would be awesome to use this work in real time! Would this approach imply a retraining or is there a way to do it with the available checkpoints? Thank you for sharing!

We're working on a more powerful mode, like v2, along with the training code. The training code for this version is not available for now. There are available checkpoints for current version, please check the README under the root.