harlanhong / CVPR2022-DaGAN

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation
https://harlanhong.github.io/publications/dagan.html
Other
957 stars 125 forks source link

fine-tune the model #65

Open wpydcr opened 1 year ago

wpydcr commented 1 year ago

Hi, I find the DaGAN works very amazing.

When I use my test video to run demo.py, I find the face will flashing and having large and small facial features. So I decide fine-tune the model. When I train 8 epochs , I find the new model only can warping whole picture but face.

1,Do I need to train for many epochs even if I only want to fine-tune the enhancement effect?

2,If I just want to make a good demo, is it best to fine-tune the model with this image of the driver video, and how do I involve the source photo in the training as well?

3,And about the train data, Do you have any skills to handle these picture to make the model training better? I just keep the head in the picture and center the face, I don't know if this method is useful. Would it be better if I added a mask to the image that only exposed the face?

Thanks for your code , looking forward to your reply!

harlanhong commented 1 year ago

I'm glad you find the DaGAN model amazing! I'll address your concerns one by one:

Since our model is trained on the Vox1 dataset, it's crucial to ensure that your test data doesn't have a significant domain gap compared to Vox1. This means that the majority of the scene should consist of faces without including too much of the torso. In our tests, as long as the distribution of faces in the test data matches that of Vox1, the results are generally good without the need for additional fine-tuning.

If you decide to fine-tune the model for your specific test video, it's not necessary to train for many epochs. You can start by training for a few epochs and then evaluate the results. If the enhancement effect doesn't meet your expectations, you can continue training for a few more epochs. To fine-tune the model using the driver video, you can include the source photo in the training set, making sure that the distribution of faces remains consistent with the Vox1 dataset.

Regarding the training data, centering the face and keeping only the head in the picture is a good approach, as it helps maintain consistency with the Vox1 dataset. Adding a mask to expose only the face might be useful, but it's not strictly necessary. The key is to ensure that the face distribution in your training set is similar to that of the Vox1 dataset, as this will help the model generalize better to your test data.

I hope this answers your questions.

wpydcr commented 1 year ago

Thank you very much for replying to me so quickly. I sent you an email(fhongac@cse.ust.hk) with a more detailed description, do you have time to look at it? Looking forward to your reply! Thanks!