Open manupatel007 opened 3 years ago
That's a good idea. For the audio-to-landmark part, it's already a lightweight model which can run in real time. The time consuming parts are image warping and translation network. You can try to replace the residual blocks in current image translation network by the mobilenet structure to increase the speed.
Great suggestion! It would be awesome to use this work in real time! Would this approach imply a retraining or is there a way to do it with the available checkpoints? Thank you for sharing!
Great suggestion! It would be awesome to use this work in real time! Would this approach imply a retraining or is there a way to do it with the available checkpoints? Thank you for sharing!
We're working on a more powerful mode, like v2, along with the training code. The training code for this version is not available for now. There are available checkpoints for current version, please check the README under the root.
Hey! it is an awesome work on animating faces according to text. I wanted to know, if the image(just a sketch) is fixed and the audio received is varying, can we make a custom lightweight model like mobilenet, which can generate Generate input data for inference and Audio-to-Landmarks prediction using the browser's webgl only, in real time.