Open j4ys0n opened 1 month ago
To clarify here @j4ys0n - are you referring to unload models from a group of federated workers right? or are you referring to llama.cpp workers?
JFYI we have /backend/shutdown
for unloading a single model, but indeed that does not propagate to all federated workers.
both federated and llama.cpp workers. there should be a way to unload models from workers without having to restart the services. same with removing workers from the cluster, there should be a way to remove workers and not have the coordinator think there's one missing.
https://github.com/mudler/LocalAI/issues/3378 I made similar issue
So im digging into this issue, any thoughts on what i should go look at @mudler
Anywho if i see something ill say something :D
Is your feature request related to a problem? Please describe.
When running multiple distributed workers, if I have to change or restart the service on a worker, I have to bring the entire cluster down. Restarting 1 worker does not unload the models that are loaded except from the respective worker. The VRAM on the other workers does not change. (separate issue, but restarting a worker also results in a new worker id)
Describe the solution you'd like
And API endpoint to unload individual and all models.
Describe alternatives you've considered
I'm not sure how else to address this issue.
Additional context