Questions about FID. - Githubissues

DRJYYDS commented 1 year ago

Hi, this is an excellent repo. I may want to know what FID you obtain?

junhsss commented 1 year ago

Hi @DRJYYDS. I haven't computed FID scores, but just wrote a script for that.

yuanzhi-zhu commented 1 year ago

FYI, I got a fid of 20 using @junhsss 's code;

I also calculated with https://github.com/mseitzer/pytorch-fid using the same 10k generated images (calculated with the whole cifar10 dataset) and got a fid of 56.

DRJYYDS commented 1 year ago

FYI, I got a fid of 20 using @junhsss 's code;

I also calculated with https://github.com/mseitzer/pytorch-fid using the same 10k generated images (calculated with the whole cifar10 dataset) and got a fid of 56.

Get it! Thanks. It's interesting to see the FID gap. When you calculate the FID with pytorch-fid, didyou first save the picture and then read it? It's known that the fid score of Cifar-10 is sensitive to the format.

yuanzhi-zhu commented 1 year ago

Hi @DRJYYDS ,

sorry for the confusion.

In @junhsss 's implementation, the ground truth folder has 10k images, but the 56 fid was calculated using the whole cifar10 dataset (60k images in total).

I just retried the pytorch-fid with the same 10k images and got a fid of 21, which agrees well :)

PS: I have my own implementation of consistency models (slightly different from this one (unet architecture, LPIPS model, etc.)). The fid I got is 41 with pytorch-fid using 10k generated samples and 60k GT images (the model is trained with batch size=160 and a total steps=70k).

Both are still worse than the reported fid in the original paper (8.7 for one step and 5.8 for two steps).

DRJYYDS commented 1 year ago

Hi @DRJYYDS ,

sorry for the confusion.

In @junhsss 's implementation, the ground truth folder has 10k images, but the 56 fid was calculated using the whole cifar10 dataset (60k images in total).

I just retried the pytorch-fid with the same 10k images and got a fid of 21, which agrees well :)

PS: I have my own implementation of consistency models (slightly different from this one (unet architecture, LPIPS model, etc.)). The fid I got is 41 with pytorch-fid using 10k generated samples and 60k GT images (the model is trained with batch size=160 and a total steps=70k).

Both are still worse than the reported fid in the original paper (8.7 for one step and 5.8 for two steps).

Thanks for your reply!

I believe that you now have the correct FID. By the way, FID is calculated between 50k generated images and 50k real images in most papers' reported results.

It's interesting to see the performance gap between your implementation and @junhsss 's implementation and the reported fid in the original paper. It may indicate the Consistency Model is kind of sensitive to some settings (e.g, batchsize, schedule...). My own implementation is also around 15 to 20. I believe that if you calculate the FID using 10k/50k generated images and 10k/50k real images with your implementation, it can be expected that you can get FID under 20 :)

yuanzhi-zhu commented 1 year ago

Thanks for your reply!

I believe that you now have the correct FID. By the way, FID is calculated between 50k generated images and 50k real images in most papers' reported results.

It's interesting to see the performance gap between your implementation and @junhsss 's implementation and the reported fid in the original paper. It may indicate the Consistency Model is kind of sensitive to some settings (e.g, batchsize, schedule...). My own implementation is also around 15 to 20. I believe that if you calculate the FID using 10k/50k generated images and 10k/50k real images with your implementation, it can be expected that you can get FID under 20 :)

Thank you for your information, I will try it with more samples :)

I will check more on the difference to the original implementation if I got spare time...

DRJYYDS commented 1 year ago

Thanks for your reply! I believe that you now have the correct FID. By the way, FID is calculated between 50k generated images and 50k real images in most papers' reported results. It's interesting to see the performance gap between your implementation and @junhsss 's implementation and the reported fid in the original paper. It may indicate the Consistency Model is kind of sensitive to some settings (e.g, batchsize, schedule...). My own implementation is also around 15 to 20. I believe that if you calculate the FID using 10k/50k generated images and 10k/50k real images with your implementation, it can be expected that you can get FID under 20 :)

Thank you for your information, I will try it with more samples :)

I will check more on the difference to the original implementation if I got spare time...

Good luck to you! If you got any questions, we can discuss them together, I also try to work on Consistency Model recently.

mo666666 commented 10 months ago

Hi, guys, I would like to ask the number of iterations you adopt when you calculate FID. Is the high FID possibly due to the insufficient training?

DRJYYDS commented 10 months ago

Hi, guys, I would like to ask the number of iterations you adopt when you calculate FID. Is the high FID possibly due to the insufficient training?

Sure. According to the following work "Improved training technique for training consistency models" by Song Yang, the training on CIFAR-10 needs 8000 epochs.

mo666666 commented 10 months ago

I re-read the original paper of the consistency model and there is one point that confuses me. In Table 3, they propose to set the EMA decay rate of 0.9999 for CT on CIFAR-10 dataset. Does this mean that we need another EMA model (except for $\theta^{-}$) to evaluate the FID?

junhsss / consistency-models

Questions about FID. #5