mcmonkeyprojects / sd-dynamic-thresholding

Dynamic Thresholding (CFG Scale Fix) for Stable Diffusion (SwarmUI, ComfyUI, and Auto WebUI)
MIT License
1.12k stars 108 forks source link

SDE samplers? #11

Closed art926 closed 1 year ago

art926 commented 1 year ago

I've tried it with different samplers, and it works with some better than others (and with some doesn't work at all). What I've noticed about SDE, it outputs amazing (!) results half way through (I have set the option to output the intermediate results), but at about last 3rd of the process the image gets kinda totally destroyed - turnes into cartoonish garbage. And it happens with all images, prompts and the script settings. It's almost like if SDE tweaks CFG on its own and interferes with the script at the last steps, making it maybe... negative? I don't know the whole math behind it, but that's how it feels. Is it possible to fix? The intermediate SDE results that I see are sooo good, I really would like to have this script working with that sampler (Karras or non-Karras). Please)

mcmonkey4eva commented 1 year ago

Oh, wow, yeah, I can replicate that. Looks like SDE is too strong during the final detail steps.

This isn't something that necessarily can/should be fixed in the code right now, but likely will be a focus in a planned future update to the extension to have an option to automatically optimize scale settings.

One option you can do to combat this right now, is open the advanced settings and use something like this:

image

Note the use of Down schedulers, with Minimum values set to more reasonable scale values.

This will essentially run the first few steps at the full scale you specified, and then gradually move down towards the minimum value for the final steps.

"Cosine Down" will change scale in a pattern like the red line, and "Half Cosine Down" will follow the blue line (note that it only moves halfway towards the minimum, so you will have to set minimum lower to compensate):

image

(Note also that red goes down quicker, whereas blue stays high longer)

In a couple quick tests I ran myself, this seems to fix the overburn when using SDE.

In theory this should get you results that retain the intermediate result quality you like, without burning the details.

If that's not enough, you might consider using an extension like https://github.com/klimaleksus/stable-diffusion-webui-anti-burn or https://github.com/AlUlkesh/sd_save_intermediate_images to just skip the problem over entirely and store the result of a step prior to the last step.

art926 commented 1 year ago

Please, look into it again, if you can. When I compute the intermediate results, I see SDE producing amazing results in the first half of the steps its doing, like up to 50-70%, and then only in the last few steps it trashes all the results badly. Like, I really think the actual CFG becomes opposite/negative and instead of, let's say, "a human", it decides to generate a total non-human in the last 5 steps and turns them into robots or monsters of all sorts. If there a way to print out to the console the actual CFG used on every step after your code corrects it?

mcmonkey4eva commented 1 year ago

There's nothing unique in DynThresh about SDE vs others in terms of scale handling. Scales are definitely not going wrong in SDE in particular. If SDE does do its own scaling (unlikely?) that kinda just makes it incompatible with external changes to the scale (by this extension or anything like it).

art926 commented 1 year ago

I figured out, that the issue comes from any DPM sampler, (except the 2M one)! It's very easy to reproduce with this approach:

Positive prompt: a human, photo Neg. prompt: 3d Steps: 60 (and I output every 20 intermediate steps, so I can see how it converges) Sampler: any DPM (expect 2M), I was especially focused on SDE

For constant normal (7-12) CFG scales you'll see good pictures, nothing special (the prompt is simple, after all). For CFG 30 all colors and shapes get "burned", just as expected. Now, we enable your extension and keep the CFG 30. Mimic CFG scale: 7 Top percentile: 100

If we keep Mimic Scale Scheduler and CFG Scale Scheduler - constant, the picture will look Ok, but the colors and shapes get burned in the last steps, which is expected. But now, when you make the scheduler Linear or Cosine down, you'll see some magic! At the very last steps, the photos will turn into clear 3D pictures, very creepy looking! So, the negative "3d" will become positive, and it will also try to make the picture non-human and non-photo! If you add "painting" instead of "3d" in the negative prompt, you'll see the picture becomes painting. Cleary, the DPM samplers invert the condition after certain steps!

Now the most interesting part: a solution to this issue. It's just to set the minimum for the scheduler to the half (exactly the half) of the initial CFG ! So, I set my initial CFG 30, then I set those minimums to 15 (actually, you can keep Mimic scheduler lower, but the colors will be too low constant), and then I get beautiful pictures! Notice, that if you switch the sampler to Euler, while keep these min values =15, then you get the shapes and colors burned again. So, definitely, the DPM samplers do some tricks with the CFG on their own.

I think it all worth mentioning in your extension description.