huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.51k stars 5.28k forks source link

the FID for the Stable Diffusion #2681

Closed jiayisunx closed 1 year ago

jiayisunx commented 1 year ago

In the NVIDIA paper "eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers", it said the zero-shot FID for the stable diffusion on the COCO2014 validation set can be 8.59: image And I see a chart of FID vs CLIP scores in https://huggingface.co/runwayml/stable-diffusion-v1-5, but no specific number: image

Can you tell me the official FID score for the stable-diffusion?

patrickvonplaten commented 1 year ago

To be honest, I also don't really know here - gently pinging the original author @pesser

jiayisunx commented 1 year ago

Or would it be possible to share the specific scores from the chart of FID vs CLIP scores I mentioned above, and I would really appreciate it if you can share the code to reproduce the scores. Thanks!

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ntajbakhsh commented 1 year ago

The numbers you read from the graph are based on a 10k validation set, and I think the numbers in the table are based on a 30k set. In general, the larger the validation set, the smaller the FIDs. In the graph below, we have generated the plot for HF-SD 1.5 using a 30k set. As you can see, the lowest FID is close to what you see in the table. In case, you are curious, the other curve, labeled as NeMo-SD, is our re-implementation of SD, which we release along with a convergence recipe as a part of Nvidia's NeMo Multimodal. image

jiayisunx commented 1 year ago

Hi @ntajbakhsh, thank you for your reply! Can you please share the script to reproduce the scores?