Open yuefeng21 opened 2 years ago
Hi @yuefeng21,
Here's what you need to do to replicate the results:
Let me know if there are any further issues.
You mean by setting truncation_ratio to 1 and mean_latent to None?
I also have a question, why do people evaluate result on 256 or 512 resolution, not 1024 resolution.
If I recall correctly, setting the truncation ratio to 1 should be enough, but double check me on this.
Regarding 256x256, I'm not sure to be honest. We did 256x256 to make the comparison fair and repeat the same conditions as the baseline methods (as well as validating their reported FID scores). It's also a runtime issue, running this on 20-50k 1024x1024 images can take a lot of time.
I set the trucation_ratio to 1 and sampled 5k images, 173.72 (0.727) this is what I got of FID
What resolution are you using? Both for the real and generated images
Real image: I down-sampled them to 256 resolution. generated image: first generated as 1024 resolution and then down-sample to 256 resolution.
It seems that I save the image with range[-1,1] After I scale them to [0, 255], 4139.23 (41.315) this is what I got for FID
But none of those result seems related to 11.5 in the paper 🤣
The images should be in [-1,1] range. What is your current KID score, for this? (KID is more robust to small sample size.)
In addition, how many real images did you use?
Looks like the images need to be in [0,1] actually. I just looked at the FID code and there's a normalization (which is set to true as default) in the inception net forward pass.
Current KID is 0.161 (0.003) for 256 resolution with no mean latent. 5k images
OK I will double check that.
We saved all output images in a single large .npy file, so the line you marked does not apply to our case (in both FID and KID). In addition, if you save the images as jpg then compression artifacts might also effect the scores.
Edit: going through the FID code again, the .npy array should indeed be in [-1,1]. The FID code transforms it to [0,1] and then to [-1,1] again.
I also save the result in a large .npy file as instructed. and the original range was [-1,1].
Regardless of the FID that is sensitive to dataset size
under 256 resolution, [-1,1] range with both real images and generated image, and truncation_ratio = 1. with 5k images, the KID I got is 0.166 (0.002), in the paper it was 2.65. How is this result related to that (2.65)?
You need to multiply the KID result by 1000
the one inside the brackets (0.002)?
No, that's the variance. You need to multiply the mean.
How many real images do you use?
Hi Roy, I use https://github.com/abdulfatir/gan-metrics-pytorch as suggested from previous issue to calculate FID and KID, but I cannot generate similar evaluation number as in the paper. I sampled 5k 512*512 resolution images from FFHQ and your model. and the FID I got is 214.33 (0.988), KID 0.233 (0.003) . However in the paper it was 11.5, KID 2.65. Could you please how exactly you generate the FID and KID of your result?
Appreciate your help.