SDXL unconditional generation is broken

morrisalp commented 2 weeks ago

Describe the bug

Running SDXL generation with CFG scale 1.0 and 0.0 give the exact same results, but CFG scale 0.0 should perform unconditional generation.

Reproduction

from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline import torch

pipe = StableDiffusionXLPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True ).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

img1 = pipe(prompt=prompt, guidance_scale=1.0, generator = torch.Generator().manual_seed(0)).images[0] img2 = pipe(prompt=prompt, guidance_scale=0.0, generator = torch.Generator().manual_seed(0)).images[0]

Logs

No response

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

🤗 Diffusers version: 0.30.3
Platform: Linux-6.1.85+-x86_64-with-glibc2.35
Running on Google Colab?: Yes
Python version: 3.10.12
PyTorch version (GPU?): 2.4.1+cu121 (True)
Flax version (CPU?/GPU?/TPU?): 0.8.5 (gpu)
Jax version: 0.4.33
JaxLib version: 0.4.33
Huggingface_hub version: 0.24.7
Transformers version: 4.44.2
Accelerate version: 0.34.2
PEFT version: not installed
Bitsandbytes version: not installed
Safetensors version: 0.4.5
xFormers version: not installed
Accelerator: Tesla T4, 15360 MiB
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help?

@yiyixuxu @sayakpaul @DN6

morrisalp commented 2 weeks ago

Note that I think this is due to the property StableDiffusionXLPipeline.do_classifier_free_guidance using the logic self._guidance_scale > 1, while CFG scale <1 (including unconditional generation) is useful for some applications.

a-r-r-o-w commented 2 weeks ago

This is indeed the case. You've already listed the case for performing conditional guided (guidance_scale > 1.0) and conditional non-guided generation (guidance_scale <= 1.0). For unconditional generation, you could do so by passing an empty prompt instead and setting guidance scale <= 1.0, or pass torch.zeros(...) of the right shape to prompt_embeds (I think this is specific to certain models and torch.zeros may not produce coherent results always so empty prompt is a better choice).

a-r-r-o-w commented 3 days ago

Marking this as closed due to explanation above and inactivity. Feel free to re-open though, if there's anything else we can help with

huggingface / diffusers