zhangchenxu528 / FACIAL

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.
GNU Affero General Public License v3.0
376 stars 83 forks source link

Model Inference on Image or Other Videos #49

Open RAJA-PARIKSHAT opened 2 years ago

RAJA-PARIKSHAT commented 2 years ago

I have a question, maybe it's not relevant as I don't have a full understanding of the model. If I want to use a single image to generate a talking face, can I do it from your model? Or if I use another video and audio for inference, other than the training videos and audios, will the model perform on that well? What I am asking if I trained the model on my video of approximately 5 minutes, will the model be able to generate my other videos giving only audio input and an image input? @zhangchenxu528

zhangchenxu528 commented 2 years ago

In the face2vid module, you need to perform vid2vid training on your own video to ensure high-quality video generation. Our method does not accept a single image as input.

RAJA-PARIKSHAT commented 2 years ago

Ok that I did, I followed your collab for training on my own video, but can I use another of my video in the model, other than the video I trained. What you think would be the results?

zhangchenxu528 commented 2 years ago
  1. Once our face2vid network is trained, it does not accept new images or videos as references.
  2. For your scenario, I suggest replacing the face2vid module with the image2image translation network used in Makeittalke.