Minimum Images for Evaluation, question about dimensions

mseitzer / pytorch-fid

Compute FID scores with PyTorch.

Apache License 2.0

3.22k stars 497 forks source link

Minimum Images for Evaluation, question about dimensions #78

Closed guitar9 closed 2 years ago

guitar9 commented 2 years ago

I have trained a GAN for generating sculptures and I have only 20 Images in my Testdataset. Can I use FID Metric for evaluation? So as I read the dimensionality decrease then automatically to 64. What does it mean when the dimension is smaller? Does it mean that there are used low level features (e.g edges and lines)and not high level feature classes? What does dim 64, 192, 768, 2048 exactly mean?

GiangHLe commented 2 years ago

As my knowledge:

1/ GAN doesn't need the validation set or test set, its target is to map the distribution from latent space (normally Gaussian distribution) to the dataset's distribution. Therefore, you can compute the FID score between your synthesis dataset with your training dataset. 2/ The FID paper has recommended using a minimum sample size of 10,000 to calculate the FID otherwise the true FID of the generator is underestimated. However, I think they prefer over 50,000. 3/ "What does dim 64, 192, 768, 2048 exactly mean?" I think mseitzer explained well in the comment. https://github.com/mseitzer/pytorch-fid/blob/3d604a25516746c3a4a5548c8610e99010b2c819/src/pytorch_fid/inception.py#L24-L46

And I don't understand your question about the dimension, can you clarify it?

RayGuo-C commented 2 years ago

@GiangHLe. I met a similar problem with @guitar9. I will explain what makes me confused. If we have 200 number of images(or 2000) to train, and we need to compare the real images and generated images with FID metric. Then I have following problems:

Can we calculate the FID with dim 2048? As for this dim, they certainly conclude all features in the training dataset.
Are there some relationships between the number of training data set and feature dimension when we calculate the FID? Currently, I think the reason would be overfitting if we use the small data set. But I don't know whether it's right.

mseitzer commented 2 years ago

@GiangHLe is correct. I will explain the relationship between features dimension and number of images.

If you use features of dimensionality N, it is necessary that you use at least N images, otherwise the FID computation might not work. For more stable and accurate results, the authors recommend more images, like 50000. As an option, this package offers to use lower feature dimensionalities, which might be of use if less images are available. However, it is unclear if the resulting FID values are still a good metric for image quality. Definitely, they are not comparable to the standard FID on 2048-dim features.