fix/workaround for potential memory leak

DenOfEquity commented 1 week ago

the free_memory function in backend/memory_management.py only unloads models from the torch device, and preferentially moves some things to the offload device. So if a model has previously been moved off the torch device, it will not be fully unloaded. On my computer this happens with text encoders (JointTextEncoder model type, I've never seen an extra KModel or IntegratedAutoencoderKL), every new checkpoint load. Going back to a previously used checkpoint also results in another copy, not reuse of the one that already exists. Committed memory increases until there is no more available.

This PR adds a few lines to force unload these abandoned models. Possibly not the ideal fix, but it seems like a reasonable safety check to have even if a better fix is found.

~Also made the sd_models.unload_model_weights() function more aggressive - it fully unloads all models. This function is used in two places:~

~[Unload all models] button in Settings > Actions~
~API: unloadapi()~

Juqowel commented 1 week ago

With my extension there is still 1 extra model. (in RAM or VRAM)

Fix: if sys.getrefcount(current_loaded_models[i].model) <= 2: to if sys.getrefcount(current_loaded_models[i].model) <= 3:

DenOfEquity commented 1 week ago

In all my testing, 3 references is normal for the JointTextEncoder type, when still in use. The 3rd reference is removed on new model load (and a new JointTextEncoder is created), so that's when the unload happens. Changing to 3 could result in not strictly necessary reload of text encoders.

Juqowel commented 1 week ago

Changing to 3 could result in not strictly necessary reload of text encoders.

In what case can I expect such behavior? I have not seen it yet.

But we can return to this later. For now - I'm on 98e0adcc78c1641cc9352bfdad8843bfb49cfc19 without extension. Changes in sd_models don't work with button Unload all models. Just this without any effect:

468684

Also, can we move Actions from Other to the top of the list? With another PR maybe.

DenOfEquity commented 1 week ago

I'll revert the unload_model_weights() change as I'm unsure about exactly what's going on. Sometimes it seems to do nothing, others it results in the expected drop in RAM / VRAM / Committed memory. It's less relevant after the memory management change. Moving the 'Actions' to be more quickly accessible seems reasonable, but out of scope for this.

moudahaddad14 commented 1 day ago

@DenOfEquity it's still dosen't work for me, i'm still facing the same problem of stucking mid way of the generation (50%) unless i enable the checkbox Enable for UNet (always maximize offload) which is kinda fixes the problem but it slows the generation compared to without it, i have (RTX 2070S 8VRAM) and 24 System Ram, i'm using NF4 V1 btw!

DenOfEquity commented 1 day ago

This fix is related to memory issues after changing models (usually many times). Your case is different. Make sure you've followed the guidance in #1474 and that other apps aren't using VRAM. Also Swap method: Queue and Swap location: CPU are most likely to be reliable. I can use Flux NF4v2 on a 1070 8GB VRAM with 16GB RAM.

lllyasviel / stable-diffusion-webui-forge

fix/workaround for potential memory leak #2315