Open DRJYYDS opened 1 year ago
Hi @DRJYYDS. I haven't computed FID scores, but just wrote a script for that.
FYI, I got a fid of 20 using @junhsss 's code;
I also calculated with https://github.com/mseitzer/pytorch-fid using the same 10k generated images (calculated with the whole cifar10 dataset) and got a fid of 56.
FYI, I got a fid of 20 using @junhsss 's code;
I also calculated with https://github.com/mseitzer/pytorch-fid using the same 10k generated images (calculated with the whole cifar10 dataset) and got a fid of 56.
Get it! Thanks. It's interesting to see the FID gap. When you calculate the FID with pytorch-fid, didyou first save the picture and then read it? It's known that the fid score of Cifar-10 is sensitive to the format.
Hi @DRJYYDS ,
sorry for the confusion.
In @junhsss 's implementation, the ground truth folder has 10k images, but the 56 fid was calculated using the whole cifar10 dataset (60k images in total).
I just retried the pytorch-fid with the same 10k images and got a fid of 21, which agrees well :)
PS: I have my own implementation of consistency models (slightly different from this one (unet architecture, LPIPS model, etc.)). The fid I got is 41 with pytorch-fid using 10k generated samples and 60k GT images (the model is trained with batch size=160 and a total steps=70k).
Both are still worse than the reported fid in the original paper (8.7 for one step and 5.8 for two steps).
Hi @DRJYYDS ,
sorry for the confusion.
In @junhsss 's implementation, the ground truth folder has 10k images, but the 56 fid was calculated using the whole cifar10 dataset (60k images in total).
I just retried the pytorch-fid with the same 10k images and got a fid of 21, which agrees well :)
PS: I have my own implementation of consistency models (slightly different from this one (unet architecture, LPIPS model, etc.)). The fid I got is 41 with pytorch-fid using 10k generated samples and 60k GT images (the model is trained with batch size=160 and a total steps=70k).
Both are still worse than the reported fid in the original paper (8.7 for one step and 5.8 for two steps).
Thanks for your reply!
I believe that you now have the correct FID. By the way, FID is calculated between 50k generated images and 50k real images in most papers' reported results.
It's interesting to see the performance gap between your implementation and @junhsss 's implementation and the reported fid in the original paper. It may indicate the Consistency Model is kind of sensitive to some settings (e.g, batchsize, schedule...). My own implementation is also around 15 to 20. I believe that if you calculate the FID using 10k/50k generated images and 10k/50k real images with your implementation, it can be expected that you can get FID under 20 :)
Thanks for your reply!
I believe that you now have the correct FID. By the way, FID is calculated between 50k generated images and 50k real images in most papers' reported results.
It's interesting to see the performance gap between your implementation and @junhsss 's implementation and the reported fid in the original paper. It may indicate the Consistency Model is kind of sensitive to some settings (e.g, batchsize, schedule...). My own implementation is also around 15 to 20. I believe that if you calculate the FID using 10k/50k generated images and 10k/50k real images with your implementation, it can be expected that you can get FID under 20 :)
Thank you for your information, I will try it with more samples :)
I will check more on the difference to the original implementation if I got spare time...
Thanks for your reply! I believe that you now have the correct FID. By the way, FID is calculated between 50k generated images and 50k real images in most papers' reported results. It's interesting to see the performance gap between your implementation and @junhsss 's implementation and the reported fid in the original paper. It may indicate the Consistency Model is kind of sensitive to some settings (e.g, batchsize, schedule...). My own implementation is also around 15 to 20. I believe that if you calculate the FID using 10k/50k generated images and 10k/50k real images with your implementation, it can be expected that you can get FID under 20 :)
Thank you for your information, I will try it with more samples :)
I will check more on the difference to the original implementation if I got spare time...
Good luck to you! If you got any questions, we can discuss them together, I also try to work on Consistency Model recently.
Hi, guys, I would like to ask the number of iterations you adopt when you calculate FID. Is the high FID possibly due to the insufficient training?
Hi, guys, I would like to ask the number of iterations you adopt when you calculate FID. Is the high FID possibly due to the insufficient training?
Sure. According to the following work "Improved training technique for training consistency models" by Song Yang, the training on CIFAR-10 needs 8000 epochs.
I re-read the original paper of the consistency model and there is one point that confuses me. In Table 3, they propose to set the EMA decay rate of 0.9999 for CT on CIFAR-10 dataset. Does this mean that we need another EMA model (except for $\theta^{-}$) to evaluate the FID?
Hi, this is an excellent repo. I may want to know what FID you obtain?