The CFG factor used in stable-diffusion. Unlike truncation trick in GAN and temperature in FLOW model, diffusion model doesn't have the way to control the trade-off between the image quality and the variety. As a result, Diffusion models beat GANs on image synthesis[^1] introduced Class Guidance, which use the gradient from the pre-trained classifier in the inference time to obtain this trade-off. However, Class Guidance has some drawbacks, and this paper proposed an alternative way that does not need the classifier to calculate the class guidance.
Method
It can be proved that applying classifier guidance with weight $w+1$ to an unconditional model would theoretically lead to the same result as applying classifier guidance with weight $w$ to a conditional model. As a result, we can get classifier guidance from subtracting the results from conditional model by the results from unconditional mode.
In the training time, the conditional and unconditional model are trained jointly and shared the same weight, i.e. using the $\phi$ as unconditional with the probability $p_{uncond}$
In detail, here is the class guidance
after we apply the weight $w+1$ to unconditional model, we get conditional model with weight $w$
In inference time
Highlight
The class guidance introduced in Diffusion models beat GANs on image synthesis needs the pre-trained classifier and since the classifier add the gradient the the diffusion model, it is somehow similar to the adversarial training.
might take less time than classifier guidance in inference, since we don't need to calculate the gradients from the classifier. However, it is not always true, since classifier-free guidance must inference both unconditional and conditional model to get the guidance, which might take longer time
Limitation
Comments
can not realize BACKGROUND section
[^1]: Diffusion models beat GANs on image synthesis
Introduction
The CFG factor used in stable-diffusion. Unlike truncation trick in GAN and temperature in FLOW model, diffusion model doesn't have the way to control the trade-off between the image quality and the variety. As a result, Diffusion models beat GANs on image synthesis[^1] introduced Class Guidance, which use the gradient from the pre-trained classifier in the inference time to obtain this trade-off. However, Class Guidance has some drawbacks, and this paper proposed an alternative way that does not need the classifier to calculate the class guidance.
Method
In detail, here is the class guidance
after we apply the weight $w+1$ to unconditional model, we get conditional model with weight $w$
In inference time
Highlight
Limitation
Comments
[^1]: Diffusion models beat GANs on image synthesis