Closed jiayisunx closed 1 year ago
To be honest, I also don't really know here - gently pinging the original author @pesser
Or would it be possible to share the specific scores from the chart of FID vs CLIP scores I mentioned above, and I would really appreciate it if you can share the code to reproduce the scores. Thanks!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
The numbers you read from the graph are based on a 10k validation set, and I think the numbers in the table are based on a 30k set. In general, the larger the validation set, the smaller the FIDs. In the graph below, we have generated the plot for HF-SD 1.5 using a 30k set. As you can see, the lowest FID is close to what you see in the table. In case, you are curious, the other curve, labeled as NeMo-SD, is our re-implementation of SD, which we release along with a convergence recipe as a part of Nvidia's NeMo Multimodal.
Hi @ntajbakhsh, thank you for your reply! Can you please share the script to reproduce the scores?
In the NVIDIA paper "eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers", it said the zero-shot FID for the stable diffusion on the COCO2014 validation set can be 8.59: And I see a chart of FID vs CLIP scores in https://huggingface.co/runwayml/stable-diffusion-v1-5, but no specific number:
Can you tell me the official FID score for the stable-diffusion?