huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.42k stars 5.27k forks source link

DPM solver ++ third order with SDE #8288

Open christopher5106 opened 4 months ago

christopher5106 commented 4 months ago

I see this in A1111... is it possible to get the missing implementation here: https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L887 Thanks (better results than 2nd order https://www.reddit.com/r/StableDiffusion/comments/16amqso/i_think_dpm_3m_sde_karras_is_producing_better/)

yiyixuxu commented 4 months ago

cc @asomoza, is this popular in the community? worth it to add this in?

christopher5106 commented 4 months ago

Check the results here https://medium.com/@furkangozukara/sampling-method-dpm-2m-sde-vs-dpm-3m-sde-with-schedule-type-uniform-vs-karras-vs-exponential-09ab0ac379dd

asomoza commented 4 months ago

It's not that popular, almost all the good images I see don't use it, but there's some people that had claimed it's better.

I don't think the comparison done in that post it's really good, first, to really see if the result is good or not, you shouldn't use a lora and the same image and composition all the time. I get it that he needs to self promote but I don't think it's really scientific to put himself on all the images to do this.

My experience with it is not good so I don't use it anymore, there's a couple of issues with it that I don't like.

For example, what I like to use to compare them is to use plain color backgrounds over a subject, so we can see the colors and if it has artifacts.

I'll use comfyui and a couple of examples where I know it fails the most, both images use the same seed.

20 steps, 6.5 cfg, karras

dpmpp_2m_sde_gpu dpmpp_3m_sde_gpu
ComfyUI_dpm2_00003_ ComfyUI_dpm3_00003_

Here the color spills more, the cat is more blueish in the 3M version.

20 steps, 8 cfg, karras dpmpp_2m_sde_gpu dpmpp_3m_sde_gpu
ComfyUI_dpm2_00007_ ComfyUI_dpm3_00007_

in this one the 3M adds a little more contrast and more details, it fails with the background since I asked a plain pink background.

30 steps, 8 cfg, karras dpmpp_2m_sde_gpu dpmpp_3m_sde_gpu
ComfyUI_dpm2_00014_ ComfyUI_dpm3_00014_

Here we can see that the 3M its burned, too much brightness and loses a lot of details. So to fix it we need to do a lot more steps.

50 steps, 8 cfg, karras dpmpp_2m_sde_gpu dpmpp_3m_sde_gpu
ComfyUI_dpm2_00012_ ComfyUI_dpm3_00012_

now it's fixed but almost the same as the 2M, and why bother if we get a good result with 2M at half the steps. You can also lower the CFG to fix it:

25 steps, 6 cfg, karras dpmpp_2m_sde_gpu dpmpp_3m_sde_gpu
ComfyUI_dpm2_00016_ ComfyUI_dpm3_00016_

Overall it's a lot more finicky and most of the time I don't really see the difference when they're both good, so I just stay with the 2M or other samplers.

I would also like the opinion of people that have good eyes like @bghira for this, since I don't consider myself that much of an expert evaluating samplers.

christopher5106 commented 4 months ago

@asomoza can you confirm your tests are not under current regression https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/15519

asomoza commented 4 months ago

Oh I need to clarify that, I don't use automatic1111 anymore, I used comfyui for the examples. Also I don't really use SD 1.5 that much so I'm not qualified to test it with those models, because in my eyes I see them all bad. For example in that thread, people said that it lost "realistic textures" but honestly, I see all of them as plastic and fake.

I read some people that wrote that it's better with hands, so I'll test that later with both models, at least I can tell in SD 1.5 if the hands are bad.

Lastly I didn't test with anime because I'm not that good generating anime images and also I think that SD 1.5 and SDXL are mostly perfect at generating them with most samplers.

I encourage some more people to do some tests and post their findings here with high res images, don't use posts from other people and don't use LoRAs because the training of the LoRAs can make the results bad even if the sampler it's good.

christopher5106 commented 4 months ago

Thanks :) very interesting points Originally I want this 3M SDE option for an upscaler with SD1.5.

bghira commented 4 months ago

i took a look and i'm wondering if there's a use case that isn't just still frames where this works better. if it works for upscaling, that's nice to have options, but the colour bleed seems substantial. has it been tested on eg. SV3D or SVD? controlnet commonly uses UniPC, maybe could work there too. but we're negating the benefits of UniPC by switching to a slower sampler if not

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.