Artifacts with DPM++ 2M SDE Karras, even when using `use_lu_lambdas`

ivanprado commented 11 months ago

Describe the bug

The results obtained using DPM++ 2M SDE Karras contain artifacts that suggest there is some bug. This happens across different XL models, although is not so visible in the base SDXL. Using Automatic1111 with the same models and the same kind of scheduler results in high-quality results, and it seems impossible to achieve the same quality using the diffusers equivalent.

An image generated using Automatic1111 DPM++ 2M SDE Karras with the Juggernaut XL v6 model adorable_infant_dpm++_2M_SDE_Karras_jugger6xl

An image generated using the equivalent scheduler from diffusers: adorable_infant_dpm++_2M_SDE_Karras_jugger6xl_diffusers

The artifacts are very visible in the last image.

The following didn't help:

Loading the model in fp32
use_karras_sigmas=False
use_karras_sigmas=False and dpmsolver++
euler_at_final=True and use_lu_lambdas=True

I see this happening clearly at least with the following models:

frankjoshua/juggernautXL_version6Rundiffusion
Lykon/dreamshaper-xl-1-0

Reproduction

I've created a github repo to help reproducing the problem. Also, a simpler code snippet is provided below:

from diffusers import AutoPipelineForText2Image
from diffusers.schedulers import DPMSolverMultistepScheduler
import torch

model_id = "frankjoshua/juggernautXL_version6Rundiffusion"
pipeline = AutoPipelineForText2Image.from_pretrained(
    model_id, 
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
).to("cuda")

pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
        pipeline.scheduler.config, 
        use_karras_sigmas=True,
        sde_type="sde-dpmsolver++",
        euler_at_final=True,
        use_lu_lambdas=True
        )

prompt = "Adorable infant playing with a variety of colorful rattle toys."

results = pipeline(
    prompt=prompt, 
    guidance_scale=3,
    generator=torch.Generator(device="cuda").manual_seed(42),  
    num_inference_steps=25, 
    height=768, 
    width=1344)
display(results.images[0])

Logs

No response

System Info

diffusers version: 0.24.0
Platform: Linux-5.15.0-78-generic-x86_64-with-glibc2.31
Python version: 3.10.13
PyTorch version (GPU?): 2.1.1 (True)
Huggingface_hub version: 0.20.1
Transformers version: 4.36.2
Accelerate version: 0.25.0
xFormers version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help?

@yiyixuxu @patrickvonplaten

spezialspezial commented 11 months ago

I second this. Was short of submitting an issue myself. Interestingly KDPM2AncestralDiscreteScheduler and DPMSolverSDEScheduler (edit: At twice the cost, mind you) are not as bad as other schedulers for SDXL, both close to the original Katherine Crowson implementation. Fooocus, Comfy, Auto1111 all have crisp output using dpmpp_2m variants

spezialspezial commented 11 months ago

DPMSolverMultistep-Karras

DPMSolverSDE-Karras

Fooocus_dpmpp_2m_sde_gpu

spezialspezial commented 11 months ago

DPMSolverMultistep-SDE-Lula - better but still no dice

Prompt: close-up photo portrait of a soccer player Negative: bad, blurry Seed: 1, cuda Guidance: 5 Steps: 40 Model: juggernautxl_version6rundiffusion

spezialspezial commented 11 months ago

Likely the same isssue + investigation + proposed fix https://github.com/huggingface/diffusers/issues/5689

Any chance we could just get the original k_samplers ported as diffusers schedulers or does that come with any legal issues?

sayakpaul commented 11 months ago

Cc: @LuChengTHU @yiyixuxu

mar-muel commented 11 months ago

Hey there - Have been looking into this as well

Shouldn't the argument be algorithm_type instead of sde_type? @ivanprado

yiyixuxu commented 11 months ago

Here is what I got with euler_at_final

from diffusers import AutoPipelineForText2Image
from diffusers.schedulers import DPMSolverMultistepScheduler
import torch

model_id = "frankjoshua/juggernautXL_version6Rundiffusion"
pipeline = AutoPipelineForText2Image.from_pretrained(
    model_id, 
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
).to("cuda")

pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
        pipeline.scheduler.config, 
        algorithm_type="sde-dpmsolver++",
        euler_at_final=True,
        )

prompt = "Adorable infant playing with a variety of colorful rattle toys."

results = pipeline(
    prompt=prompt, 
    guidance_scale=3,
    generator=torch.Generator(device="cuda").manual_seed(42),  
    num_inference_steps=25, 
    height=768, 
    width=1344)
results.images[0].save("yiyi_test_9_out.png")

yiyi_test_9_out

yiyixuxu commented 11 months ago

@spezialspezial

You can use k samplers in diffusers with https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion_k_diffusion/pipeline_stable_diffusion_k_diffusion.py

yiyixuxu commented 11 months ago

@ivanprado

would you be able to share your auto1111 setting exactly as it is?

We looked into the difference between auto1111 and diffusers before and concluded the same issue for DPM exists in auto1111/k-sampler. However, the output you provided here is a lot better than what we would generate with use_karras_sigmas=True in diffusers and better than euler_at_final or use_lu_lambda ,so I wonder if anything changed and would like to revisit this

ivanprado commented 11 months ago

@yiyixuxu I used:

Adorable infant playing with a variety of colorful rattle toys.
Steps: 25, Sampler: DPM++ 2M SDE Karras, CFG scale: 3, Seed: 42, Size: 1344x768, Model hash: 1fe6c7ec54, Model: Juggernaut_XL_6, Version: v1.5.1

aycaecemgul commented 11 months ago

I am having the same problem, Automatic's version of schedulers greatly outperforms HF. I wanted to use DPM++ 2M SDE Karras to generated photorealistic results but it yields unrealistic results with less detail. The updates provided in old pr's does not fix this issue. Sadly this makes it less appealing to use diffusers with SD XL.

amlarraz commented 11 months ago

About this issue, here are some recap, thoughts, and experiments:

automatic1111 uses the k-diffusion samplers.
The StableDiffusionKDiffusionPipeline is created for SD1.5 and 2.1 but not for SDXL.

We thought that the k-diffusion samplers could fix the problem or, at least, reduce the quality gap. We've created a pipeline mixing the StableDiffusionSDXLPipeline and StableDiffusionKDiffusionPipeline and the results are ok, the noise has disappeared as you can see in the image. The generation parameters are the same as @ivanprado shared.

yiyixuxu commented 11 months ago

@amlarraz do you want to upstream the k-diffusion pipeline for sdxl? it is on my to-do to add one. it will help us to benchmark against k-diffusion

yiyixuxu commented 11 months ago

@amlarraz I added it here, I will use it to debug our scheduler. feel free to give it a review :)

https://github.com/huggingface/diffusers/pull/6447

amlarraz commented 11 months ago

Hey @yiyixuxu sorry for my late response. Amazing!!! I've checked your code and I've left some comments, your code and ours are mostly the same

huggingface / diffusers