vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.55k stars 409 forks source link

[Feature]: Option to Force GC after every operation. #3267

Closed brknsoul closed 3 months ago

brknsoul commented 3 months ago

Feature description

For those with 8gb or less vram, sometimes a model is loaded (eg: VAE or ESRGAN upscaler) that doesn't trigger the GC threshold, then the next operation's model is loaded on top, causing vram usage to spill into shared ram, slowing that operation significantly.

While one could set the GC Threshold to a very low value, this can cause unnecessary GC triggers.

A nicer feature would be to just have an option to trigger GC after each individual operation.

Version Platform Description

n/a

vladmandic commented 3 months ago

But... define "every operation"?

Because at least those core 3 will pull different model parts in vram for simple txt2img.

Then add things like load/unload lora, ipadapter, Controlnet execution, etc.

And add vae encode as well if doing img2img or Inpaint.

And you end up exactly where we are today if you set threshold to 0.

And each GC costs 0.sec on average and also causes model to deoptimize its jit path.

I'm open to suggestions.

Disty0 commented 3 months ago

There were missing places. This should fix them: https://github.com/vladmandic/automatic/commit/092a326c09e8ba81e5d913f34a4499ad7bcf92e1

vladmandic commented 3 months ago

thanks! so is there anything to do here?

btw - an idea, maybe we should set gc threshold to zero on lowvram systems and disable spamming log?

brknsoul commented 3 months ago

I dunno if it's a zluda bug, or a bug with torch gc, but setting gc threshold to 0 'crashes' sd.next. (app stops after reading references.json, where gc would initially happen.)

Disty0 commented 3 months ago

btw - an idea, maybe we should set gc threshold to zero on lowvram systems and disable spamming log?

I agree. Also i think we should reduce the logs to debug for non lowvram systems as well.

I dunno if it's a zluda bug, or a bug with torch gc, but setting gc threshold to 0 'crashes' sd.next. (app stops after reading references.json, where gc would initially happen.)

This doesn't happen on ROCm or IPEX on my end. Probably ZLUDA bug.

vladmandic commented 3 months ago

new func - not forcing to 0 if zluda is used just because we haven't gotten to the bottom of that.

threshold = 0 if (shared.cmd_opts.lowvram and not shared.cmd_opts.use_zluda) else shared.opts.torch_gc_threshold

also improved logging, this is an example:

GC: utilization={'gpu': 8, 'ram': 20, 'threshold': 0} gc={'collected': 510, 'saved': 0.25} before={'gpu': 1.93, 'ram': 9.29} after={'gpu': 1.68, 'ram': 9.29, 'retries': 0, 'oom': 0} device=cuda fn=vae_decode time=0.25