mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
https://localai.io
MIT License
25.96k stars 1.95k forks source link

Method to unload model(s) would be very useful. #3632

Open j4ys0n opened 1 month ago

j4ys0n commented 1 month ago

Is your feature request related to a problem? Please describe.

When running multiple distributed workers, if I have to change or restart the service on a worker, I have to bring the entire cluster down. Restarting 1 worker does not unload the models that are loaded except from the respective worker. The VRAM on the other workers does not change. (separate issue, but restarting a worker also results in a new worker id)

Describe the solution you'd like

And API endpoint to unload individual and all models.

Describe alternatives you've considered

I'm not sure how else to address this issue.

Additional context

mudler commented 1 month ago

To clarify here @j4ys0n - are you referring to unload models from a group of federated workers right? or are you referring to llama.cpp workers?

JFYI we have /backend/shutdown for unloading a single model, but indeed that does not propagate to all federated workers.

j4ys0n commented 1 month ago

both federated and llama.cpp workers. there should be a way to unload models from workers without having to restart the services. same with removing workers from the cluster, there should be a way to remove workers and not have the coordinator think there's one missing.

Nyralei commented 1 month ago

https://github.com/mudler/LocalAI/issues/3378 I made similar issue

levidehaan commented 4 days ago

So im digging into this issue, any thoughts on what i should go look at @mudler

Anywho if i see something ill say something :D