invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
23.13k stars 2.4k forks source link

[enhancement]: option to unload from memory #6856

Open aaronbolton opened 2 weeks ago

aaronbolton commented 2 weeks ago

Is there an existing issue for this?

Contact Details

What should this feature add?

a command line option to unload model from RAM after a defined period of time

Alternatives

running as a container and using Sablier to shutdown the container after some time, this has the downside of if traffic isn't see through the web interface it will be shut even if jobs are running.

Additional Content

No response

psychedelicious commented 2 weeks ago

What's the use case for this?

aaronbolton commented 2 weeks ago

It would help with freeing up the memory for other applications such as Ollama, ollama has a similar feature were it unloads the models out of memory after 5min by default but this is configurable

systemofapwne commented 4 days ago

Invoke already (partially) unloads models from VRAM if you set memory bounds. However, this seems to only work properly, when you let the generation stop on its own - Not if you abort the queue. I reported this already some time ago but devs seems to be busy with the huge amount of reported issues: https://github.com/invoke-ai/InvokeAI/issues/6759

Jonseed commented 2 days ago

@aaronbolton you might like my new Ollama Node for help expanding prompts in Invoke.

By the way, I have also noticed that Invoke doesn't seem to free up the memory after a generation, so Ollama runs slower on subsequent generations, probably because it is offloading most processing to cpu. It would be great if Invoke had an option to totally free the memory after each gen, for use cases like this. (My node has a toggle to unload the model from Ollama after generating the expanded prompt.)

fahadshery commented 16 hours ago

I would like to second that. Ollama unloads models from VRAM after a timeout (default 5 mins). It would be nice to unload models from GPU once generation is completed and then wait for timeout and if there was no further generation then it unloads the model completely! Currently I have to just restart invoke to unload the model from VRAM

psychedelicious commented 15 hours ago

You can set lazy_offload: false in the invokeai.yaml config file and the app will actively offload models.

fahadshery commented 5 hours ago

You can set lazy_offload: false in the invokeai.yaml config file and the app will actively offload models.

still gobbles up some VRAM when sitting idle. Is it not possible to fully unload?

psychedelicious commented 4 hours ago

It only offloads down to the confused vram cache setting. There's no way at the moment to forcibly offload all models.

I think it'd be fairly straightforward to add an endpoint to do this, open to contributions.