Make restart a sampler, add node to generate sigmas

ssitu / ComfyUI_restart_sampling

ComfyUI nodes for restart sampling based on the paper "Restart Sampling for Improving Generative Processes"

GNU General Public License v3.0

83 stars 6 forks source link

Make restart a sampler, add node to generate sigmas #18

Closed blepping closed 6 months ago

blepping commented 6 months ago

very preliminary proof of concept start at making restart a SAMPLER rather than a stand alone sampler node. just putting this here to facilitate discussion

ssitu commented 6 months ago

sorry for not explaining. the values in parenthesis are the t_min/t_max next to the min/max values for the actual segment sigmas to illustrate the difference.

i've made a fair amount of progress on https://github.com/ssitu/ComfyUI_restart_sampling/pull/18, i think at this point it actually is functional though it still needs testing and cleanup.

i also added a self-test function and ran it on all scheduler, restart scheduler, default and a1111 segments from steps 2 to 200 + comparing the recovered plan and it seems to work.

the simple_test schedule just... doesn't seem to work though. what is it supposed to do? using it creates an invalid plan a lot of the time. also even with other schedulers it seems possible for restart sigmas to end lower than the the next normal segment. i had to adjust the self test sensitive for that to only checking up to two decimal places.

simple_test was just an alternative to simple, because I wasn't sure which one was more appropriate. Now that you're running into issues, I think we can remove simple_test. So as of right now, you are recovering the correct sigmas? It was just simple_test where it didn't?

blepping commented 6 months ago

So as of right now, you are recovering the correct sigmas?

yes, at least so far as i have tested. i think the approach works but... unfortunately i'm not really good at that sort of abstract stuff and it's not mathematically proven correct.

It was just simple_test where it didn't?

the plan from simple_test seems like it just straight up is incorrect a lot of the time. so the issue is originally creating the plan, not recreating the plan from the sigmas.

just for example, 20 steps, simple_test/simple_test, default segments:

** Dumping restart sampling plan (total steps 26):
[    ] Step   1..18 : 14.53, 10.9, 8.303, 6.443, 5.088, 4.082, 3.321, 2.735, 2.276, 1.91, 1.613, 1.367, 1.161, 0.9838, 0.8299, 0.6932, 0.5693, 0.4538, 0.3417
[R  1] Step  19..20 : 0.5884, 0.5039, 0.4224
[    ] Step  21..22 : 0.3417, 0.223, 0.0
[R  1] Step  23..24 : 0.2983, 0.2332, 0.158
[R  2] Step  25..26 : 0.2983, 0.2332, 0.158

edit: just double checked and i can confirm it is the same with the main branch. i'm pretty sure it just never worked since i started using restart with comfyui (so before any of my pulls). maybe comfy did something that broke it at some point if it was okay before

blepping commented 6 months ago

how about this? we can force the last sigma in the restart segment to match s_min which simplifies life considerably. it seems like it would have to be wrong for the restart segment to go lower than the normal segment, right?

i also realized it's not unreasonable for sampling to end with restart segments but they do have to reach s_min (s_min would presumably be 0 at the end). i needed to make other other change to prevent exponential from blowing up on math.log(0) by using a tiny value instead. actually not sure why that wasn't an issue before but the exponential scheduler is:

def get_sigmas_exponential(n, sigma_min, sigma_max, device='cpu'):
    sigmas = torch.linspace(math.log(sigma_max), math.log(sigma_min), n, device=device).exp()
    return append_zero(sigmas)

so it'll immediately do math.log(0) if sigma_min is 0.

with these changes it passes the plan building and recovery from sigmas for every permutation including simple_test. simple_test also works now and produces reasonable results.

we could also force the restart segment first sigma to t_max. i don't currently have that enabled, but i believe everything still works doing that. however, i don't know what the rules are for diffusion sampling - is it wrong to do that? we have to for s_min i think, but there might be leeway with that part.

ssitu commented 6 months ago

with these changes it passes the plan building and recovery from sigmas for every permutation including simple_test. simple_test also works now and produces reasonable results.

I'll probably just remove simple_test after this PR.

i also realized it's not unreasonable for sampling to end with restart segments but they do have to reach s_min (s_min would presumably be 0 at the end). i needed to make other other change to prevent exponential from blowing up on math.log(0) by using a tiny value instead. actually not sure why that wasn't an issue before

I would reason that s_min is always non-zero. Looks like s_min comes from the model config, and I'm guessing it's the smallest noise level that the model was trained on during the noising process. In the k_diffusion code, zero is appended to every schedule type using the append_zero function to achieve the full denoise. That's going to cause problems if the restart segment's s_min is zero. I guess I never tested if I had the s_min set to zero. I feel like it should always start the restart segment one step before the zero.

we could also force the restart segment first sigma to t_max. i don't currently have that enabled, but i believe everything still works doing that. however, i don't know what the rules are for diffusion sampling - is it wrong to do that? we have to for s_min i think, but there might be leeway with that part.

Shouldn't the first sigma in the restart segment always be t_max/s_max already? And s_min should be the exact same too. I don't think I understand the problem.

blepping commented 6 months ago

I'll probably just remove simple_test after this PR.

it's actually working pretty well now, and i think it works better for some samplers like SDE. is it more like a SGM schedule?

I feel like it should always start the restart segment one step before the zero.

just doing calc_sigmas based on the penultimate step when the last one is 0 doesn't work because then you can easily get the first calculated sigma higher than s_max.

one thing that's possible is something like this (pseudocode from build_plan_items) -

how it currently works:

segments = round_restart_segments(sigmas, restart_segments)
# ... in a restart segment
s_min = sigmas[i+1]
restart_sigmas = calc_sigmas(smin, blahblah)[:-1]

possible change:

segments = round_restart_segments(sigmas if sigmas[-1] > 0 else sigmas[:-1], restart_segments)
# ... in a restart segment
s_min = sigmas[i+1] if sigmas[-1] > 0 else sigmas[1]
restart_sigmas = calc_sigmas(smin, blahblah)[:-1]

seems kind of complicated (also not sure the results actually sampling were better than other possibilities and might be worse).

my current solution is to just use max(model_sampling.sigma_min, s_min) which seems to work okay and seems reasonable in principle?

there is a further consideration also: when the restart segment is going all the way down to 0, then we may also need to reduce the restart steps (because we can't restart_sigmas[:-1]). or special case that situation where the restarts steps is 1 to not even use a schedule and just make sigmas [s_max, 0]. (i prefer letting it add an extra restart step, i don't know how correct that is though).

Shouldn't the first sigma in the restart segment always be t_max/s_max already? And s_min should be the exact same too.

well, i don't know what should happen, but it's definitely not the case in practice. when you call a scheduler (basically calc_sigmas) it does not necessarily return sigmas starting and with the min/max you gave it ("ending" would be the penultimate sigma). example adding a little debug code to the calc_sigmas function:

CALC min=0.029167, max=14.615, n=20: tensor([14.6146, 10.7468,  8.0815,  6.2049,  4.8557,  3.8654,  3.1238,  2.5572,  2.1157,  1.7648,  1.4806,  1.2458,  1.0481,  0.8784,  0.7297,  0.5964,  0.4736,  0.3555,  0.2322,  0.0292,  0.0000])
CALC min=0.35555, max=0.59, n=3: tensor([0.5908, 0.4708, 0.3552, 0.0000])
CALC min=0.029167, max=0.3, n=3: tensor([0.3006, 0.2004, 0.0292, 0.0000])

that's normal/normal 20 steps, calculating the main sigmas and the two default restart segments. just to be clear min/max/n are the parameters passed to calc_sigmas and after the colon it's the sigmas the schedule generated. so for the main sigmas we asked for a max of 14.615 but we actually got 14.6146 and we asked for a min (before 0) of 0.029167 but we actually got 0.0292.

blepping commented 6 months ago

okay, with this last commit i think we're almost there. i wasn't happy with how much code this pull was adding and i'm bad about overengineering/overcomplicating stuff. i've managed to simplify it a fair amount.

most of the extra code now is node definitions which are unavoidable and validation/test stuff which is good to have.

remaining to do:

figure out whether forcing the restart segment last sigma to max(model_sigma_min, s_min) is good enough or whether some other approach is needed (it seems to work fine)
figure out whether to force the restart segment first sigma to s_max or leave what calc_sigmas returned (results seem reasonable in either case, the current version is taking the second approach)
testing + remove some debug code
document changes in the README

this isn't ready to merge until those last items are resolved but i'm going to mark this ready for review now.

edit: by the way, i snuck in a small feature where you can set a restart segment to "default" or "a1111" to insert those segment presets at that point. i.e. you can now do [custom_segment_definition, "a1111", another_custom_segment]. this is fully compatible with existing definitions as a segment with a single string value was previously invalid.

ssitu commented 6 months ago

Sorry for the late reply, I've been a bit distracted.

I'll probably just remove simple_test after this PR.

it's actually working pretty well now, and i think it works better for some samplers like SDE. is it more like a SGM schedule?

I don't quite remember why I have it, but I believe when I was first adding in the schedulers for the restart segments, there seemed to be two ways to adapt the simple scheduler for the restart segments. I can't say that I know what SGM is.

my current solution is to just use max(model_sampling.sigma_min, s_min) which seems to work okay and seems reasonable in principle?

So this is for avoiding s_min being zero? That seems pretty reasonable to me.

well, i don't know what should happen, but it's definitely not the case in practice. when you call a scheduler (basically calc_sigmas) it does not necessarily return sigmas starting and with the min/max you gave it ("ending" would be the penultimate sigma). example adding a little debug code to the calc_sigmas function:

My bad, I forgot that it is based on the model sigmas rather than the provided sigmas. We should probably stick with the model sigmas rather than force it to the provided ones.

by the way, i snuck in a small feature where you can set a restart segment to "default" or "a1111" to insert those segment presets at that point. i.e. you can now do [custom_segment_definition, "a1111", another_custom_segment]. this is fully compatible with existing definitions as a segment with a single string value was previously invalid.

So it's basically adding the segments that those special presets would add, on top of the custom ones? Would entering "a1111" multiple times duplicate the segments that it would generate?

So I haven't looked too far into your code, but I wanted to clarify the rough idea. The scheduler adds the restart segments, then the sampler refinds all the segments, then does the algorithm?

blepping commented 6 months ago

Sorry for the late reply, I've been a bit distracted.

not a problem, this isn't time sensitive or anything.

there seemed to be two ways to adapt the simple scheduler for the restart segments.

i see. i'm actually curious what adapting a scheduler for restart segments means though - what's the difference between the normal "normal" scheduler and the restart "normal" scheduler?

I can't say that I know what SGM is.

me either. :) some samplers like dpmpp 3m SDE only like some schedules - it seems like the SDE samplers like karras and will also work with sgm_uniform. "the golden scheduler" (custom scheduler) also has a SGM option that makes it work better for dpmpp 2m SDE at least.

restart's simple scheduler doesn't work with dpmpp 2m SDE:

** Dumping restart sampling plan (total steps 20):
[    ] Step   1..20 : 14.61, 10.9, 8.303, 6.443, 5.088, 4.082, 3.321, 2.735, 2.276, 1.91, 1.613, 1.367, 1.161, 0.9838, 0.8299, 0.6932, 0.5693, 0.4538, 0.3417, 0.02917, 0.0

(testing without any restart segments)

however the builtin simple does:

** Dumping restart sampling plan (total steps 20):
[    ] Step   1..20 : 14.61, 10.9, 8.303, 6.443, 5.088, 4.082, 3.321, 2.735, 2.276, 1.91, 1.613, 1.367, 1.161, 0.9838, 0.8299, 0.6932, 0.5693, 0.4538, 0.3417, 0.223, 0.0

simple_test also does:

** Dumping restart sampling plan (total steps 20):
[    ] Step   1..20 : 14.53, 10.9, 8.303, 6.443, 5.088, 4.082, 3.321, 2.735, 2.276, 1.91, 1.613, 1.367, 1.161, 0.9838, 0.8299, 0.6932, 0.5693, 0.4538, 0.3417, 0.223, 0.0

seems like the difference between built in simple and restart simple is the penultimate sigma: 0.223 vs 0.02917 (that's the model sigma_min i think). actually, discarding the penultimate sigma for simple_test also works.

So this is for avoiding s_min being zero? That seems pretty reasonable to me.

correct, or maybe it should be phrased as avoiding going below the model's sigma_min. we could also do min(s_max, model_sampling.sigma_max) to be consistent. don't think going above the model's sigma_max is an issue in practice though.

My bad, I forgot that it is based on the model sigmas rather than the provided sigmas. We should probably stick with the model sigmas rather than force it to the provided ones.

i don't fully understand this. are you saying i need to change something? i don't think i can change the call to the schedule for restart segments to use the model sigma_min, sigma_max because then i'd get a schedule starting at the extremes and not in the middle like a restart segment needs to be. but we do at least need to avoid passing it values below the model's sigma_min.

So it's basically adding the segments that those special presets would add, on top of the custom ones?

it adds the segments at the point where they're specified. so if the user specifies:

[4,4,3,4],"default"

that will be the same as

[4,4,3,4],[3,2,0.06,0.30],[3,1,0.30,0.59]

and if they specify:

[4,4,3,4],"default"

that will be the same as:

[3,2,0.06,0.30],[3,1,0.30,0.59],[4,4,3,4]

Would entering "a1111" multiple times duplicate the segments that it would generate?

yes, that's right. obviously you wouldn't want to do that. :) also "a1111","default" doesn't work because the segments overlap.

The scheduler adds the restart segments, then the sampler refinds all the segments, then does the algorithm?

generating the plan works the same as before except for ensuring s_min doesn't go below the model's sigma_min. after that there are some changes that apply for both the existing samplers and the plugin sampler:

the plan gets converted to sigmas which is also what the RestartScheduler node will return. this is just concating all the planitem sigmas with one slight difference: if the first normal sigma matches the previous chunk last sigma then we use normal_sigmas[1:] to avoid duplicate sigmas (since after a restart segment, the normal segment will start where the restart segment ended now that we're forcing it to s_min).

i don't try to rebuild the exact plan before - that was needlessly complicated. now i only split into chunks at the start of restart segments where we find a sigma that increases instead of decreases. the noise for a restart segment is calculated from previous_chunk[-1] (s_min) and chunk[0] (s_max). this also has the added benefit of combining segments better than the previous approach. for example:

** Dumping restart sampling plan (total steps 26):
[    ] Step   1..14 : 14.61, 11.73, 9.34, 7.384, 5.789, 4.5, 3.465, 2.641, 1.991, 1.483, 1.091, 0.7909, 0.5647, 0.3964, 0.273
[R  1] Step  15..16 : 0.59, 0.4056, 0.273
[    ] Step  17..20 : 0.273, 0.1842, 0.1213, 0.07786, 0.04848
[R  1] Step  21..22 : 0.3, 0.1279, 0.04848
[R  2] Step  23..24 : 0.3, 0.1279, 0.04848
[    ] Step  25..26 : 0.04848, 0.02917, 0.0

would previously call the sampler with each of the lines of sigmas from the plan as a separate chunk, now it would be:

result of converting plan to sigmas:
[14.6146, 11.7254,  9.3402,  7.3836,  5.7894,  4.4998,  3.4647,  2.6408,  1.9909,  1.4832,  1.0908,  0.7909,  0.5647,  0.3964,  0.2730,  0.5900,  0.4056,  0.2730,  0.1842,  0.1213,  0.0779,  0.0485,  0.3000,  0.1279,  0.0485,  0.3000,  0.1279,  0.0485,  0.0292,  0.0000]

sampling chunks:
14.61, 11.73, 9.34, 7.384, 5.789, 4.5, 3.465, 2.641, 1.991, 1.483, 1.091, 0.7909, 0.5647, 0.3964, 0.273 <- used as s_min for restart chunk below

0.59, 0.4056, 0.273, 0.1842, 0.1213, 0.07786, 0.04848 <- s_min for restart below

0.3, 0.1279, 0.04848 <- s_min for restart below

0.3, 0.1279, 0.04848, 0.02917, 0.0

hopefully that makes sense. long story short is making the plan is basically the restart algorithm and that works like it did before.

(by the way, if/when you merge this i'd recommend squashing it so you don't clog up your commit log with all my iterating)

ssitu commented 6 months ago

i see. i'm actually curious what adapting a scheduler for restart segments means though - what's the difference between the normal "normal" scheduler and the restart "normal" scheduler?

Some of the schedulers don't expect a s_max and s_min, so I had to incorporate that into them to get it to work. Although now that I think about it, there probably is a way to work with the original scheduler.

restart's simple scheduler doesn't work with dpmpp 2m SDE: ... however the builtin simple does: ... simple_test also does:

Huh, that's interesting. Completely incidental.

correct, or maybe it should be phrased as avoiding going below the model's sigma_min. we could also do min(s_max, model_sampling.sigma_max) to be consistent. don't think going above the model's sigma_max is an issue in practice though.

I agree. I'd say it's up to the user if they want to go above the model's sigma_max

i don't fully understand this. are you saying i need to change something? i don't think i can change the call to the schedule for restart segments to use the model sigma_min, sigma_max because then i'd get a schedule starting at the extremes and not in the middle like a restart segment needs to be. but we do at least need to avoid passing it values below the model's sigma_min.

Sorry, I meant that some schedulers don't exactly stick with the given s_max and s_min to make a schedule between s_max and s_min. The ones like simple and normal try to keep the same sigmas that the model is trained on, and that's why the returned s_max and s_min might not be the same as the given ones. I guess it wouldn't be that big of a deal if we force it or not, but I'd lean towards keeping the ones given back from the scheduler. Plus, you're already doing it that way.

i don't try to rebuild the exact plan before - that was needlessly complicated. now i only split into chunks at the start of restart segments where we find a sigma that increases instead of decreases. the noise for a restart segment is calculated from previous_chunk[-1] (s_min) and chunk[0] (s_max). this also has the added benefit of combining segments better than the previous approach. for example: would previously call the sampler with each of the lines of sigmas from the plan as a separate chunk

I think I get it. I'll have to look at the code a bit closer sometime. This all could have been much easier if I forced the restart sampler to be the same as the sampler for the normal parts, but I liked the idea of giving that option.

blepping commented 6 months ago

Some of the schedulers don't expect a s_max and s_min, so I had to incorporate that into them to get it to work.

ah, i see. i thought there was something special about the actual schedules that got created. if that's the case, then there's no need to restrict the choice of schedulers for the main sigmas only the "restart schedule". correct?

I'd say it's up to the user if they want to go above the model's sigma_max

sounds fine to me. right now it is getting clamped to the sigma_max but i'll revert that change.

I guess it wouldn't be that big of a deal if we force it or not, but I'd lean towards keeping the ones given back from the scheduler. Plus, you're already doing it that way.

just for the max sigma, so i'm halfway doing it that way.

This all could have been much easier if I forced the restart sampler to be the same as the sampler for the normal parts, but I liked the idea of giving that option.

you mean schedule, not sampler. correct? i love options and hate restrictions so i think that choice is great!

from my side, i think remaining is just more testing + updating the README. is there anything you don't like about this pull currently that you want me to change?

one thing i was thinking of doing is adding start/end step to the restart scheduler node since it's using the same backend code as the stand alone samplers that support passing the step range.

ssitu commented 6 months ago

ah, i see. i thought there was something special about the actual schedules that got created. if that's the case, then there's no need to restrict the choice of schedulers for the main sigmas only the "restart schedule". correct?

That's correct. There's probably an easy way to support any scheduler by now.

I guess it wouldn't be that big of a deal if we force it or not, but I'd lean towards keeping the ones given back from the scheduler. Plus, you're already doing it that way.

just for the max sigma, so i'm halfway doing it that way.

This wasn't a problem before? Or was it always there, but it was just overlooked?

This all could have been much easier if I forced the restart sampler to be the same as the sampler for the normal parts, but I liked the idea of giving that option.

you mean schedule, not sampler. correct? i love options and hate restrictions so i think that choice is great!

I suppose both sampler and schedule. Instead of switching between samplers, it could have just been one sampler call, with some occasional added noise in the middle. It could be even more customizable by changing both sampler and schedule for each chunk/segment, since the algorithm boils down to a chain of KSampler nodes.

from my side, i think remaining is just more testing + updating the README. is there anything you don't like about this pull currently that you want me to change?

I'll take a look soon.

one thing i was thinking of doing is adding start/end step to the restart scheduler node since it's using the same backend code as the stand alone samplers that support passing the step range.

So that's the same part of the advanced KSampler start/end step parameters, wouldn't you want it in the restart sampler node instead?

blepping commented 6 months ago

That's correct. There's probably an easy way to support any scheduler by now.

i don't know how to, but if you find a way i could definitely add that. this pull does currently add the ability to use any scheduler for the main sigmas (since the restart scheduler node supports an optional sigmas connection and will just use that instead of generating its own sigmas).

i don't know how you would do it for other schedulers though. some of them also don't have callable functions necessarily, like the recently added Align Your Steps scheduler is only a node that returns sigmas as far as i know.

This wasn't a problem before? Or was it always there, but it was just overlooked?

ah, nope. i added it after i asked about that, just to be consistent but i agree it's not necessary. although i don't see a way it would be useful in practice. the only case it would make a difference is if a restart segment t_max was above the model max sigma which would mean adding more noise than samplers normally add for denoise 1.0. so that would be effectively deleting the image and starting again, right? i don't think any of the existing content would ever survive adding model sigma_max noise, let alone going above that.

I suppose both sampler and schedule.

i like the idea, though i'm not sure how you could switch for the main schedule. you'd need to have sigmas already to find the point where the restart segments could be inserted. it would definitely be possible to use different schedules for each restart segment though, and different samplers for every segment.

i won't mess with that in this pull, but maybe later on... i think it would be possible to do that in a backward compatible way, like you could specify a segment as [steps, restarts, tmin, tmax, "schedule"] or even ["schedule1", "schedule2", "scheduleN"]

I'll take a look soon.

no problem!

So that's the same part of the advanced KSampler start/end step parameters, wouldn't you want it in the restart sampler node instead?

i think it would be kind of hard to do it that way and possibly confusing from a user perspective because it's already common for scheduler nodes to have a denoise parameter which will adjust the output sigmas but there are no sampler nodes that take schedule adjustment options (that i know of, anyway).

it would be hard from a code perspective since denoise/step range is currently done in the planning phase and those adjustments should occur in the initial schedule rather than after the restart segments were added.

if you think it's important to do it like that i can try to figure out a way though. i think it would complicate the code significantly though.

ssitu commented 6 months ago

the only case it would make a difference is if a restart segment t_max was above the model max sigma which would mean adding more noise than samplers normally add for denoise 1.0. so that would be effectively deleting the image and starting again, right? i don't think any of the existing content would ever survive adding model sigma_max noise, let alone going above that.

Yeah, I'd guess it'd just be like starting again but with a different seed.

i think it would be kind of hard to do it that way and possibly confusing from a user perspective because it's already common for scheduler nodes to have a denoise parameter which will adjust the output sigmas but there are no sampler nodes that take schedule adjustment options (that i know of, anyway).

it would be hard from a code perspective since denoise/step range is currently done in the planning phase and those adjustments should occur in the initial schedule rather than after the restart segments were added.

if you think it's important to do it like that i can try to figure out a way though. i think it would complicate the code significantly though.

I was just asking. I haven't looked into the custom sampler node, so I don't know what parts are in charge of which params/processes. If it belongs in the scheduler node then that's where it should be.

ssitu commented 6 months ago

I think it's ready to merge, if there's nothing else you want to change then I'll do the merge.

blepping commented 6 months ago

I think it's ready to merge, if there's nothing else you want to change then I'll do the merge.

i'm pretty sure it's good but i've been a bit distracted with the hidiffusion stuff. i'll be up early tomorrow so i'll double check some stuff and get back to you within 12 hours of this post and you can merge tomorrow if no issues emerge.

i've mostly tested the advanced stuff, i did check with the existing integrated samplers but i'll just double check them to make sure nothing happened in between. also the other main thing i'd be worried about is setting denoise and start/end steps. it's actually broken currently (by me, unfortunately) and i think this pull will fix it. i had to make several revisions though.

ssitu commented 6 months ago

Take your time. That hidiffusion stuff looks pretty neat.

blepping commented 6 months ago

okay, i think we're finally ready! i made two small changes: adding a little more information to the README about setting denoise in the RestartScheduler node when passing it sigmas and i also made the plan generation respect the existing comfy discard penultimate sigma logic. that is only possible when passing the sampler by name (so in the normal restart sampler nodes, but not the custom restart sampler or when using a custom sampler + RestartScheduler and RestartSampler nodes. with this change and setting segments empty i get identical results compared to the built in nodes.

the discard penultimate sigma stuff might not be that important, right now it applies only to dpm2 and dpm2 ancestral - the other two samplers it may apply to are unipc but you filter those out of the list.

some thoughts (probably to be done in a different pull):

it seems like unipc works fine now, possibly remove it from the filter list?
it might be useful to add a discard_penultimate_sigma parameter to RestartScheduler and maybe also the advanced restart sampler nodes. i actually get pretty good results with SDE samplers setting this. not sure why comfy doesn't do that, but normally dpmpp_3m_sde doesn't work too well - discarding the penultimate sigma seems to help quite a bit.
maybe discard penultimate sigma logic should also apply to calculating restart segments? not sure, i haven't tested this. it's probably not necessary.

That hidiffusion stuff looks pretty neat.

yeah, it seems to work better than Deep Shrink for SD 1.5 at least. MSW-MSA attention is a nice speedup too. results with SDXL aren't so great though, it may just be an issue with my implementation.

Deep Shrink and the RAUNet stuff also pair well with restart sampling because the restarts can help clean up artifacts from trying to generate at high resolution.

blepping commented 6 months ago

by the way, it is possible for a user to do their own discard penultimate sigma with something like this:

and pass the sigmas to the the new RestartScheduler node. so it's not impossible to get that behavior currently, just a little inconvenient.

ssitu commented 6 months ago

it seems like unipc works fine now, possibly remove it from the filter list?

Go for it!

it might be useful to add a discard_penultimate_sigma parameter to RestartScheduler and maybe also the advanced restart sampler nodes. i actually get pretty good results with SDE samplers setting this. not sure why comfy doesn't do that, but normally dpmpp_3m_sde doesn't work too well - discarding the penultimate sigma seems to help quite a bit.

I don't mind, but if the builtin nodes don't allow for it, maybe it is better to leave it out to be consistent?

Deep Shrink and the RAUNet stuff also pair well with restart sampling because the restarts can help clean up artifacts from trying to generate at high resolution.

Wouldn't restarts cause artifacts if they are possible in the first place? It might clean up the artifacts that were created, but I feel like it can cause more artifacts after the re-noising?

blepping commented 6 months ago

Go for it!

actually, i changed my mind. it seems like it doesn't work so well with the default segments, results seem fairly reasonable for a1111 segments. may also depend on the scheduler. it's probably better to leave it disabled.

could possibly add another environmental variable to disable filtering (in a different pull).

I don't mind, but if the builtin nodes don't allow for it, maybe it is better to leave it out to be consistent?

you have a point, but it would be nice to be able to use a larger range of samplers without having to do awkward stuff like use a different node pack.

in any case, that can be done in a different pull if ever. once this gets merged, i'll probably open a discussion with some ideas i have and we can see which ones you like.

Wouldn't restarts cause artifacts if they are possible in the first place? It might clean up the artifacts that were created, but I feel like it can cause more artifacts after the re-noising?

well, the idea would be to restart back to right when the scaling effects ended - that way restart can clean up artifacts interpolating the tensors. you generally don't want to rewind back into a deep shrink effect (at least not deep into it - hidiffusion seems to tolerate it better).

not scientific at all, just my perception of it but i do use deep shrink (and now hidiffusion) quite frequently. i actually rarely use the generate low res then upscale approach these days.

ssitu commented 6 months ago

Yeah we can do a different PR for the filtering and the discard sigma setting.

well, the idea would be to restart back to right when the scaling effects ended - that way restart can clean up artifacts interpolating the tensors. you generally don't want to rewind back into a deep shrink effect (at least not deep into it - hidiffusion seems to tolerate it better).

This would probably make more sense to me if I looked into how these techniques work. But it's cool that combining this with those techniques yields better results.

not scientific at all, just my perception of it but i do use deep shrink (and now hidiffusion) quite frequently. i actually rarely use the generate low res then upscale approach these days.

I'll have to try it if I decide to get back into image generation again.

I'll go ahead and merge this if that's good with you.

blepping commented 6 months ago

This would probably make more sense to me if I looked into how these techniques work.

the concept is pretty simple - scale down an early input layer (or possibly multiple) when generating at high resolution to let the model deal with something more like what it was trained on and then scale it back up in the corresponding output layer. scaling can cause artifacts though, and generally loss of detail so usually the effect ends around 35%-60% of sampling. at that point the model has fixed major details in place and can just refine it/add detail. both Kohya Deep Shrink and Hidiffusion use that concept, the only real difference is the scaling methods and where in the scaling occurs.

I'll go ahead and merge this if that's good with you.

all set! hopefully i didn't miss anything super dumb.

ssitu commented 6 months ago

the concept is pretty simple - scale down an early input layer (or possibly multiple) when generating at high resolution to let the model deal with something more like what it was trained on and then scale it back up in the corresponding output layer. scaling can cause artifacts though, and generally loss of detail so usually the effect ends around 35%-60% of sampling. at that point the model has fixed major details in place and can just refine it/add detail. both Kohya Deep Shrink and Hidiffusion use that concept, the only real difference is the scaling methods and where in the scaling occurs.

Oh that makes a lot of sense. Thanks for the explanation.

all set! hopefully i didn't miss anything super dumb.

I'm sure we'll find out if there's a problem

ssitu commented 6 months ago

Ah, forgot to squash. Oh well.

blepping commented 6 months ago

welcome to being the #2 contributor in your own repo. :tada: :)

ssitu commented 6 months ago

welcome to being the #2 contributor in your own repo. 🎉 :)

Lol