Open aaronbolton opened 2 weeks ago
What's the use case for this?
It would help with freeing up the memory for other applications such as Ollama, ollama has a similar feature were it unloads the models out of memory after 5min by default but this is configurable
Invoke already (partially) unloads models from VRAM if you set memory bounds. However, this seems to only work properly, when you let the generation stop on its own - Not if you abort the queue. I reported this already some time ago but devs seems to be busy with the huge amount of reported issues: https://github.com/invoke-ai/InvokeAI/issues/6759
@aaronbolton you might like my new Ollama Node for help expanding prompts in Invoke.
By the way, I have also noticed that Invoke doesn't seem to free up the memory after a generation, so Ollama runs slower on subsequent generations, probably because it is offloading most processing to cpu. It would be great if Invoke had an option to totally free the memory after each gen, for use cases like this. (My node has a toggle to unload the model from Ollama after generating the expanded prompt.)
I would like to second that.
Ollama unloads models from VRAM after a timeout
(default 5 mins).
It would be nice to unload models from GPU once generation is completed and then wait for timeout and if there was no further generation then it unloads the model completely!
Currently I have to just restart invoke to unload the model from VRAM
You can set lazy_offload: false
in the invokeai.yaml
config file and the app will actively offload models.
You can set
lazy_offload: false
in theinvokeai.yaml
config file and the app will actively offload models.
still gobbles up some VRAM when sitting idle. Is it not possible to fully unload?
It only offloads down to the confused vram cache setting. There's no way at the moment to forcibly offload all models.
I think it'd be fairly straightforward to add an endpoint to do this, open to contributions.
Is there an existing issue for this?
Contact Details
What should this feature add?
a command line option to unload model from RAM after a defined period of time
Alternatives
running as a container and using Sablier to shutdown the container after some time, this has the downside of if traffic isn't see through the web interface it will be shut even if jobs are running.
Additional Content
No response