Closed lyx0208 closed 1 day ago
Hi! Thanks for your interest. Not really. We just embed the guidance through a new layer. In more detail:
the guidance has the following form: $\epsilon^{w} (x{t}, t, c) = \epsilon{\theta}(x{t}, t, 0) + w (\epsilon{\theta}(x{t}, t, c) - \epsilon{\theta}(x{t}, t, 0))$ As you can see, the neural network is called twice: $\epsilon{\theta}(x{t}, t, 0), \epsilon{\theta}(x_{t}, t, c)$
We just distill this guidance into a new layer called an embedding layer. That is, solve the following taks: $| \epsilon^{w} (x{t}, t, c) - \epsilon (x{t}, t, c, w)|$. Where $w$ is a new layer (it is implemented in the same way as $t$ embedding). In this way, we have one propagation instead of two.
To make a dynamic guidance, after distillation, we can make $w$ dependent on $t$. We use the step function $w(t)$. That is, if $t > \tau$, then $w(t)=0$, otherwise $w(t)=w$.
Thanks for your clear answer! I understand what the distilled models work for.
So what should I do if I want to train the model with another distillation teacher, like the newly released SD3?
There seems no official implementation of the paper "On Distillation of Guided Diffusion Models"
Yes, you are right. There is no official implementation. Probably, we will release the code for guidance distillation when we have free time. However, it is not necessary to do Consistency Distillation on top of a guidance distilled model. In other words, if you want to play with SD3, you can skip the guidance distillation step and use our CD directly with the SD3 teacher.
Ok, thanks a lot for your kind response!!
According to my own understanding, these checkpoints are distilled SD models that generates with Dynamic, is this right? Moreover, can you provide more training details of such models? Thanks a lot in advance !