Closed JunbongJang closed 9 months ago
Do we have any powerful Diffusion-GAN weights that are published?
I would be interested in working on this if the maintainers think it's a good idea :). There does seem to be a lack of publicly available checkpoints and code though (especially for UFOGen, perhaps because it's very recent).
TL;DR:
/src/diffusers/models/
to aid in training DDGAN-style models and closely related models like ADD/SD-XL Turbo.A short summary of the papers and some implementation notes:
D_\phi(\boldsymbol{y})
$ to distinguish between real and generated samples, we ask a timestep-dependent discriminator $D_\phi(\boldsymbol{y}, t)
$ to distinguish between noised real samples $\boldsymbol{y} \sim q(\boldsymbol{y} \mid \boldsymbol{x}, t)$ and noised fake samples $\boldsymbol{y_g} \sim q(\boldsymbol{yg} \mid G\theta(\boldsymbol{z}), t)$ from a generator $G_\theta(\boldsymbol{\cdot})
$ and noise $\boldsymbol{z}$ sampled from some prior distribution $p(\boldsymbol{z})$ for every noise level $t$ of a diffusion process with forward process posterior $q(\boldsymbol{\cdot} \mid \boldsymbol{x}, t)
$.(D_\phi(\boldsymbol{y}, t), G_\theta(\boldsymbol{z}))
$ and sampling from the model just involves sampling from the GAN generator $G$. So I'm not sure if it would make sense to have a DiffusionGanPipeline
, but a training example could be valuable.T \leq 8
$) but instead of modeling the true denoising distribution $q(\boldsymbol{x_{t - 1}} \mid \boldsymbol{x_t})
$ with a Gaussian $p_\theta(\boldsymbol{x_{t - 1}} \mid \boldsymbol{x_t})
$ and having a denoising model that predicts the mean of that Gaussian, the true denoising distribution is modeled with a (conditional) GAN generator $G_\theta(\boldsymbol{\cdot} \mid \boldsymbol{\cdot}) = p_\theta(\boldsymbol{x_{t - 1}} \mid \boldsymbol{x_t})
$. (The idea is that when the diffusion process has only a few steps, the true denoising distribution $q(\boldsymbol{x_{t - 1}} \mid \boldsymbol{x_t})
$ is no longer well approximated by a Gaussian because it becomes a complex multimodal distribution, so it needs to be modeled by something which can capture such distributions such as a GAN.)p_\theta
$ with the true denoising distribution $q$ using an adversarial loss with a divergence (e.g. Jenson-Shannon divergence, Wasserstein distance, etc.). The time-dependent GAN discriminator $D_\phi(\boldsymbol{x_{t - 1}}, \boldsymbol{x_t}, t)
$ decides whether $\boldsymbol{x_{t - 1}}
$ is a plausible denoised version of $\boldsymbol{x_t}
$ given timestep $t
$.G_\theta
$; the DDGAN authors choose a parameterization corresponding to original source data ("sample"
) parameterization: $p_\theta(\boldsymbol{x_{t - 1}} \mid \boldsymbol{x_t}) = \int{p(\boldsymbol{z})q(x_{t - 1} \mid x_t, x_0 = G_\theta(\boldsymbol{x_t}, \boldsymbol{z}, t))d\boldsymbol{z}}
$ where $G_\theta$ takes in a noisy sample $\boldsymbol{x_t}
$, latent noise $\boldsymbol{z}
$ sampled from a standard Gaussian, and timestep $t$ and predicts the original data $\boldsymbol{x_0}
$. We then calculate $\boldsymbol{x_{t - 1}}
$ from the predicted $\boldsymbol{x_0}
$ using the parameterization equation above (that is, for a hypothetical DDGANScheduler
, this would be the content of the step
method). Other parameterizations are possible and my understanding is that they don't necessarily map cleanly onto the current prediction_types
for normal diffusion models (e.g. epsilon
, v_prediction
).T (\approx 8)
$ steps.$, with the interpretation that it decides whether $
\boldsymbol{x{t - 1}}$ comes from $
q(\boldsymbol{x{t - 1}})$ and adds a new "auxiliary forward diffusion (AFD)" model $
C\psi(\boldsymbol{x{t - 1}}, t)$ to model $
p\theta(\boldsymbol{x_{t - 1}})`$ via regression (and isn't used during inference).G_\theta
$ uses a new parameterization $p_\theta(\boldsymbol{x_{t - 1}}) = q(\boldsymbol{x_{t - 1}} \mid \boldsymbol{x_0} = G_\theta(\boldsymbol{x_t}, t))
$. That is, the denoising model $G_\theta(x_t, t)
$ takes in a noisy sample $\boldsymbol{x_t}
$ sampled from the true forward process $q(\boldsymbol{x_{t - 1}} \mid \boldsymbol{x_0})
$ and timestep $t$ and predicts the clean data $\boldsymbol{x_0}
$ at timestep $t = 0
$ (as before).C_\psi
$ replaced with a regression term involving the original clean data $\boldsymbol{x_0}
$.\boldsymbol{x_T} \sim \mathcal{N}(0, \boldsymbol{I})
$ and then doing a single forward pass of the generator $\boldsymbol{\hat{x}_0} = G_\theta(\boldsymbol{x_T}, T)
$ to get a sample $\boldsymbol{\hat{x}_0}
$.Based on the above, I believe a single pipeline can support all models based on DDGAN, because the sampling procedure stays the relatively unchanged between the different models. However, because DDGAN and UFOGen will probably require their own schedulers because they model different distributions: DDGAN models $p_\theta(\boldsymbol{x_{t - 1}} \mid \boldsymbol{x_t}) = q(x_{t - 1} \mid x_t, x_0 = G_\theta(\boldsymbol{x_t}, \boldsymbol{z}, t))
$ while UFOGen models $p_\theta(\boldsymbol{x_{t - 1}}) = q(\boldsymbol{x_{t - 1}} \mid \boldsymbol{x_0} = G_\theta(\boldsymbol{x_t}, t))
$.
I think it might also be worth it to support discriminator model architectures for training, since DDGANs as well as the recently released Adversarial Diffusion Distillation (ADD) paper, which was used to produce the SD-XL 1.0 Turbo checkpoint, use a discriminator. Some papers use a U-Net discriminator, so are likely already supported, but others (such as ADD, to the best of my knowledge) do not.
Thank you for your interest! I look forward to seeing diffusion GAN on diffusers.
I also use diffusers try to reproduce a UFOGen, can anyone help me to discuss some details?
I also use diffusers try to reproduce a UFOGen, can anyone help me to discuss some details?
Yes, i also interested in reproducing the method of UFOGen,and confusing some detail in the paper。Have you achieve some progress about that
MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I also use diffusers try to reproduce a UFOGen, can anyone help me to discuss some details?
need a chatgroup for it. Please add me in if there is one
Is your feature request related to a problem? Please describe. I don't see any models related to diffusion-GAN in diffusers library.
Describe the solution you'd like. Is there a plan to support diffusion-GAN models in diffusers library? Especially, I would like support for the latest diffusion-GAN model, UFOGEN.
Thank you.
Additional context. Reference papers: TACKLING THE GENERATIVE LEARNING TRILEMMA WITH DENOISING DIFFUSION GANS: https://arxiv.org/pdf/2112.07804.pdf DIFFUSION-GAN: TRAINING GANS WITH DIFFUSION: https://arxiv.org/pdf/2206.02262.pdf UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs: https://arxiv.org/pdf/2311.09257.pdf