sangwoomo / FreezeD

Freeze the Discriminator: a Simple Baseline for Fine-Tuning GANs (CVPRW 2020)
https://arxiv.org/abs/2002.10964
285 stars 37 forks source link

you should change inception model to evaluation mode before calculating FID score #3

Open jychoi118 opened 4 years ago

jychoi118 commented 4 years ago

You should change inception model to evaluation mode before calculating FID score. Inception model contains batch normalization, whose training and evaluation behaviors are different.

For example, you should add "inception.eval()" below line 486 of stylegan/finetune.py

With this correction, I got significantly different FID score comparing to reports from your paper.

sangwoomo commented 4 years ago

Hi, thank you for noticing that! I completely forgot about this issue.

By the way, while correcting the mode may degrade the absolute FID scores of both fine-tuning and FreezeD, I guess the relative order would remain. Can you report your values if my guess is wrong?

jychoi118 commented 4 years ago

Yes, relative order remains the same.

I'm sorry to tell you that I lost my values with your experiment setting. I will report after I experiment with your setting again. However, I'm currently experimenting with StyleGAN-V2 and AFHQ dataset (500 dog test set) from stargan-v2 with your FreezeD method. I got FID score of 49.3 without eval(), and 98.1 with eval(). Probably StyleGAN-V1 will get about twice FID scores too.

And, thank you for your nice research!

sangwoomo commented 4 years ago

Happy to hear that the relative orders are the same! Hope you to develop a better method and report updated values in your manuscript :)

Hsintien-Ng commented 3 years ago

Yes, relative order remains the same.

I'm sorry to tell you that I lost my values with your experiment setting. I will report after I experiment with your setting again. However, I'm currently experimenting with StyleGAN-V2 and AFHQ dataset (500 dog test set) from stargan-v2 with your FreezeD method. I got FID score of 49.3 without eval(), and 98.1 with eval(). Probably StyleGAN-V1 will get about twice FID scores too.

And, thank you for your nice research!

Hi, do you mean FID score of 49.3 with eval() and 98.1 with train() mode?