Automatically sync model folder

mudler commented 1 month ago

Is your feature request related to a problem? Please describe. When creating federated networks, currently the nodes needs to have installed the same models, or rely on the fact that LocalAI automatically installs models that are available in the gallery on the first request

Describe the solution you'd like A way to sync the models folders between LocalAI federated instances

Describe alternatives you've considered N/A

Additional context

lunamidori5 commented 1 month ago

Adding myself to this issue to watch it

dave-gray101 commented 1 month ago

One idea that comes to mind: we should generate a gallery file on the host and automatically share that out to each worker.

That way, we can leverage existing downloader support, and make it a "suggested models" prompt when connecting a worker up? If the worker is non-interactive, we can have flags or commands to download one or all of the models of the gallery

lee-b commented 1 month ago

I'd like to advocate for a single/elected background downloader service (if not already present; it doesn't seem to be from the startup behavior I've seen so far):

Separating the downloading part from LocalAI would:

Decouple startup from downloading models
Allow you to run one instance/service for downloads, and as many inference nodes as needed.
Prevent LocalAI from failing if the downloader crashes, runs out of disk space, etc. [don't know if this happens right now, but probably not]

For completeness, here's some thinking on the alternatives part of the bug report / feature request:

Describe alternatives you've considered

In order of complexity:

Local file copies, network proxies for downloads

Don't worry about the actual filesystem part, just support caching network proxies (like squid) for the downloads. Results in multiple file copies and increased storage
simple and reliable, and might already be supported: e.g., set up the network proxy, and then maybe configure each node's http_proxy, https_proxy, etc. env vars?

Third-party sync tools (like syncthing)

Might be relatively simple to get going with
Users can configure for themselves
Possibly could be integrated with LocalAI as a background service with managed config files etc. Initial node connection might need side-channel comms (a cluster/node control protocol) to authorize/connect anyway
Probably wouldn't handle large files well
Probably not fast
Unsure if it even scales to model files
Still probably requires distributed file locking or a single/elected downloader.

Network filesystem with some form of distributed locking

Simpler network filesystems, as in CIFS/NIFS/S3, etc.
Distributed locking via custom protocol, or something like postgresql locks/etcd/etc.
NOTE: A separate downloader service would still work in this scenario
Likely much trickier and more duplicate work / time-wasting than it sounds like.

Proper network filesystems, k8s, etc.

Filesystems/block storage like Longhorn can replicate/mirror without intervention.
Shared write-many filesystems like CephFS and GlusterFS could help here
Still probably need distributed locking for downloading files with multiple weights files (or even just the weights + metadata files) (so again, likely much trickier and more duplicate work / time-wasting than it sounds like).
NOTE: A separate downloader service would still work in this scenario

mudler commented 1 month ago

I'd like to advocate for a single/elected background downloader service (if not already present; it doesn't seem to be from the startup behavior I've seen so far):

There is one already - it's the one in charge of downloading/applying models in runtime via the API endpoint.

One idea that comes to mind: we should generate a gallery file on the host and automatically share that out to each worker.

Good point, we generate these files already (they are hidden, prefixed with .) - however we would need to make sure to support hot-reload of the configurations

mudler / LocalAI