huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.42k stars 5.27k forks source link

Analysis of Classifier-Free Guidance Weight Schedulers #7756

Open rootonchair opened 5 months ago

rootonchair commented 5 months ago

Model/Pipeline/Scheduler description

The paper's authors performs an analysis and proposes 1 line change in order to make Classifier-Free Guidance looks better cfg

I personally run some test to confirm

SD1.5 DDIM scheduler, 50 steps, "a photograph of an astronaut riding a horse", seed: 1024 guidance scale: 7.5 static sd_default

linear (proposed) sd_default_linear

guidance scale: 14.0 static sd_org

linear (proposed) sd_linear

Open source status

Provide useful links for the implementation

Paper: https://arxiv.org/abs/2404.13040

rootonchair commented 5 months ago

Update SDXL result guidance scale: 14.0 static sdxl_org linear sdxl_org_linear

DN6 commented 5 months ago

cc: @yiyixuxu and @asomoza for visibility.

bghira commented 5 months ago

what a neat way to make use of the knowledge already in the model!

yiyixuxu commented 5 months ago

cc @asomoza can we make a callback for this?

asomoza commented 5 months ago

yes but there's a lot of techniques about manipulating the CFG, most of them without papers, I added the cutout one because I know it's really popular, makes the generations faster, and as a kind of example on how to manipulate the CFG

Maybe we should let the community add these ones later on? Other more popular ones are automatic cfg and Dynamic Thresholding

YunhoKim21 commented 4 months ago

Hi, interested on your work. Is there an explanation of WHY this phenomena happens?

bghira commented 4 months ago

i'm not an expert on this, but on a cursory glance it seems to be basing the strength of guidance by the position in the timestep schedule. this also likely works because there's two types of attention being used by the model, with earlier timesteps being cross-attn (heavily relying on text conditional input) and later timesteps being self-attn (practically ignoring the prompt)

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.