The idea is to use the ImageNet-1K dataset to compute FID for a text-conditional diffusion model.
We could take a 100 images from each imagenet class as real images, and generate 100 images against them using a prompt like f"a photograph of {class}" using the diffusion model to be evaluated and then use these distribution of images to calculate FID for each ImageNet class. The final FID score could be a mean of the FID scores for each of the scores corresponding to the classes.
The idea is to use the ImageNet-1K dataset to compute FID for a text-conditional diffusion model. We could take a 100 images from each imagenet class as real images, and generate 100 images against them using a prompt like
f"a photograph of {class}"
using the diffusion model to be evaluated and then use these distribution of images to calculate FID for each ImageNet class. The final FID score could be a mean of the FID scores for each of the scores corresponding to the classes.