tobran / GALIP

[CVPR2023] A faster, smaller, and better text-to-image model for large-scale training
MIT License
221 stars 25 forks source link

About the normalization for Inception v3 when calculating FID #27

Open AtsukiOsanai opened 1 month ago

AtsukiOsanai commented 1 month ago

Thank you for sharing your excellent work!

I'm curious as to why you chose to re-normalize the fake images in this section. Considering that Inception v3 requires input values within [-1, 1], the generated fake images on this line already meet the requirement of inception v3. Therefore, I believe that re-normalizing these images could potentially result in an incorrect FID score if you're using a pre-trained Inception v3 model from torchvision.

From what I understand, re-normalization of the tensor is necessary only when we employ standard normalization as per ImageNet statistics, like in this ImageNet training example.

I hope you will clarify this issue to achieve a fair comparison in image synthesis.

tobran commented 1 month ago

Thanks for your attention to our work. We checked the normalization in our evaluation codes and found that this problem does not exist.

The input of inception v3 is not [-1, 1]. We used the code of DM-GAN to test FID. The image passes through transforms.ToTensor(), and its distribution becomes [0, 1] (https://github.com/MinfengZhu/DM-GAN/blob/d515995b6fade48be9762c7f069c9d889d4d5559/eval/FID/fid_score.py#L222), and then is normalized to the target interval (https://github.com/MinfengZhu/DM-GAN/blob/d515995b6fade48be9762c7f069c9d889d4d5559/eval/FID/fid_score.py#L222).

Our code (https://github.com/tobran/GALIP/blob/35d0c2fd64eec2e95e7c905cb71f96d81475a46f/code/lib/modules.py#L270) converts it from [-1, 1] to [0, 1] (consistent with the distribution after transforms.ToTensor() of DM-GAN), and then normalizes it to the target interval, so the final data distribution input to inception v3 is consistent with DM-GAN.

At the same time, we save the generated images and test them using DM-GAN's evaluation codes. The results are consistent with our code.