huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.05k stars 5.36k forks source link

Request to add diffusion-GAN model #5905

Closed JunbongJang closed 9 months ago

JunbongJang commented 11 months ago

Is your feature request related to a problem? Please describe. I don't see any models related to diffusion-GAN in diffusers library.

Describe the solution you'd like. Is there a plan to support diffusion-GAN models in diffusers library? Especially, I would like support for the latest diffusion-GAN model, UFOGEN.

Thank you.

Additional context. Reference papers: TACKLING THE GENERATIVE LEARNING TRILEMMA WITH DENOISING DIFFUSION GANS: https://arxiv.org/pdf/2112.07804.pdf DIFFUSION-GAN: TRAINING GANS WITH DIFFUSION: https://arxiv.org/pdf/2206.02262.pdf UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs: https://arxiv.org/pdf/2311.09257.pdf

patrickvonplaten commented 11 months ago

Do we have any powerful Diffusion-GAN weights that are published?

dg845 commented 11 months ago

I would be interested in working on this if the maintainers think it's a good idea :). There does seem to be a lack of publicly available checkpoints and code though (especially for UFOGen, perhaps because it's very recent).

TL;DR:


A short summary of the papers and some implementation notes:

Based on the above, I believe a single pipeline can support all models based on DDGAN, because the sampling procedure stays the relatively unchanged between the different models. However, because DDGAN and UFOGen will probably require their own schedulers because they model different distributions: DDGAN models $p_\theta(\boldsymbol{x_{t - 1}} \mid \boldsymbol{x_t}) = q(x_{t - 1} \mid x_t, x_0 = G_\theta(\boldsymbol{x_t}, \boldsymbol{z}, t))$ while UFOGen models $p_\theta(\boldsymbol{x_{t - 1}}) = q(\boldsymbol{x_{t - 1}} \mid \boldsymbol{x_0} = G_\theta(\boldsymbol{x_t}, t))$.

I think it might also be worth it to support discriminator model architectures for training, since DDGANs as well as the recently released Adversarial Diffusion Distillation (ADD) paper, which was used to produce the SD-XL 1.0 Turbo checkpoint, use a discriminator. Some papers use a U-Net discriminator, so are likely already supported, but others (such as ADD, to the best of my knowledge) do not.

JunbongJang commented 11 months ago

Thank you for your interest! I look forward to seeing diffusion GAN on diffusers.

PeiqinSun commented 11 months ago

I also use diffusers try to reproduce a UFOGen, can anyone help me to discuss some details?

lileilai commented 11 months ago

I also use diffusers try to reproduce a UFOGen, can anyone help me to discuss some details?

Yes, i also interested in reproducing the method of UFOGen,and confusing some detail in the paper。Have you achieve some progress about that

kadirnar commented 10 months ago

MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices

https://arxiv.org/abs/2311.16567

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ControllableGeneration commented 6 months ago

I also use diffusers try to reproduce a UFOGen, can anyone help me to discuss some details?

need a chatgroup for it. Please add me in if there is one