kpandey008 / DiffuseVAE

Official implementation of "DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents"
MIT License
353 stars 34 forks source link

FID evaluation details #7

Closed ZhaoyangLyu closed 2 years ago

ZhaoyangLyu commented 2 years ago

Hi, I have some questions about FID evaluations for CIFAR-10 and CelebA-64. May I know how many images you generate to compute fid for CIFAR-10 and CelebA-64, respectively? And which split of the CIFAR-10 and CelebA-64 datasets do you use to compute fid? I am assuming that the training set of CIFAR-10 (50,000 images), and the whole dataset of CelebA (202,599 images) are used to compute fid. Is that correct?

kpandey008 commented 2 years ago

Hi, I used the training split of CIFAR-10 and the entire dataset of Celeba-64 for FID computation. For both CIFAR-10 and CelebA-64, the FID score computation is done using 50k generated samples. We used the torch-fidelity (https://github.com/toshas/torch-fidelity) package to compute the FID scores. The samples used for computing the FID will be made publicly available as well.

ZhaoyangLyu commented 2 years ago

Thanks for your reply! That really helps! I have two more questions.

  1. We need to resize the original images in CelebA to 64*64 before computing FID. Is that correct?
  2. And do you think it is appropriate to use Inception Score to evaluate the generation quality for CelebA dataset?
kpandey008 commented 2 years ago

1) Yeah we need to resize the original images to 64 x 64. FID is susceptible to the type of resizing used. We reported all our results by resizing the images to 64 x 64 using bilinear resizing. 2) Sure Inception score can be reported for CelebA-64 but I found previous work to only report FID so we do not report Inception score for this dataset.

ZhaoyangLyu commented 2 years ago

Thanks for your reponse!