harlanhong / CVPR2022-DaGAN

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation
https://harlanhong.github.io/publications/dagan.html
Other
957 stars 125 forks source link

hyperparameters different from paper? #78

Closed movingright closed 1 year ago

movingright commented 1 year ago

Hi, thanks for the nice repo.

  1. Looks like the hyperparmeters you have in vox-adv-256.yaml are different from what you have on the paper. For example, the loss weight for the discriminator here is 1 (https://github.com/harlanhong/CVPR2022-DaGAN/blob/master/config/vox-adv-256.yaml#L63), but in the 4.2 section of your paper (https://arxiv.org/pdf/2203.06605.pdf) you say lambda_d = 10.
  2. Also, when I was looking to fine tune (for 512p) using your checkpoint on single GPU (batch size 3), taking one (or few <10) generator step(s) with learning rate 2p-4 makes the results much worse (eg, making the frame fully black, making it very blurred, etc.). I had to make the learning rate 2p-5 to not completely screw up things, is this expected?

Can you confirm that this is expected?

harlanhong commented 1 year ago

Thank you for your reminder, the setting in the code should be the correct version. We will revise the paper in arxiv.

For the high-resolution fine-tuning, I have no idea about that. You can apply the GFPGAN to obtain a higher resolution result.

movingright commented 1 year ago

Thanks for the quick response.

The problem with applying GFPGAN on 256p is that high frequency parts of the image (eg. hair) flicker a lot. I was thinking a temporal super resolution algorithm might do better, but can't find any that do well on human faces.