huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.39k stars 5.26k forks source link

SDXL max sigma value should be doubled for 1024px generations #9531

Open bghira opened 1 week ago

bghira commented 1 week ago

Describe the bug

https://arxiv.org/abs/2409.15997

as outlined by NovelAI for their SDXL-based model, doubling sigma max is required for each doubling in the canvas length.

Reproduction

N/A

Logs

No response

System Info

-

Who can help?

No response

bghira commented 1 week ago

@yiyixuxu how would this be handled, if i were to submit a PR?

yiyixuxu commented 1 week ago

thanks @bghira I would be very interested to have this in diffusers!

Can we first leverage the sigmas argument in pipeline __call__ to pass custom sigmas to the scheduler? this way we can quickly understand how it works and compare the results;

https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L829

once we have that, we can decide how to fit into the scheduler design more naively

hlky commented 1 week ago

If you're using karras or exponential sigmas you can pass sigma_max when instantiating the scheduler:

scheduler: EulerDiscreteScheduler = EulerDiscreteScheduler.from_config(
    pipeline.scheduler.config, use_karras_sigmas=True
)
scheduler: EulerDiscreteScheduler = EulerDiscreteScheduler.from_config(
    pipeline.scheduler.config, use_karras_sigmas=True, sigma_max=scheduler.sigmas[0].item() * 2,
)
pipeline.scheduler = scheduler

This results in all the sigmas changing:

standard

scheduler.set_timesteps(4)
scheduler.sigmas
>>> tensor([14.6146,  3.1686,  0.4469,  0.0292,  0.0000])

sigma_max

scheduler.set_timesteps(4)
scheduler.sigmas
tensor([2.9229e+01, 5.6572e+00, 6.5925e-01, 2.9168e-02, 0.0000e+00])

In my testing with the default XL scheduler (timestep_spacing="leading") and some local modifications to set the max sigma before calculation there is no change to any of the sigmas, this is because the sigma max (~14.6) isn't used except when num steps >= 999.

With timestep_spacing="linspace" only the first of the resulting sigmas is changed, so, without any modification this is currently possible by passing the sigmas to pipeline.

scheduler.set_timesteps(20)
sigmas = scheduler.sigmas
sigmas[0] = sigmas[0] * 2

image = pipeline(
    ...,
    sigmas=sigmas,
).images[0]

Considering only one sigma changes in this case it might be best to just do that, but it could be fully integrated by adding sigma_max to set_timesteps and setting sigmas[-1] = sigma_max, then adding sigma_max to pipelines and passing it through via retrieve_timesteps.

Just my thoughts as someone who has contributed to scheduler code recently.

bghira commented 1 week ago

sdxl uses leading and not linspace schedule, which is pretty old and not used by any common models (was replaced by squared cosine v2 at some point)

the only way to get the 1000 timestep in the schedule is with trailing spacing, but this also probably requires a zsnr trained model.

terminus-xl-velocity-v2 used v-prediction and trailing timestep spacing which might be a better source model for this particular experiment?

though base SDXL should also be investigated, probably with trailing.

bghira commented 1 week ago

also it feels like the whole schedule needs to be rescaled so that the max propagates through the schedule rather than just impact the first timestep, unless i'm misunderstanding