mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
23.64k stars 1.81k forks source link

Feature: CLI command to preload models #491

Open sebastien-prudhomme opened 1 year ago

sebastien-prudhomme commented 1 year ago

Is your feature request related to a problem? Please describe.

Context:

It's hard to guess the good value for liveness probe timeout, it's really depend on the number of models, their filesize. Without a good value, Kubernetes keeps restarting the pod.

Describe the solution you'd like

An option to launch the local-ai binary for just preloading the models in foreground, not starting the server., and finish when all models are downloaded.

This command could be called in a Kubernetes "init container", launched before the main container, with no liveness probes defined.

Describe alternatives you've considered Writing a shell script that start the server, call the API to load models, watch for the end of the download jobs, stop the server.

Additional context

mudler commented 1 year ago

great idea @sebastien-prudhomme ! totally in favor of it :+1:

localai-bot commented 1 year ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

To preload models in a Kubernetes pod, you can use the "preload" command in LocalAI. This command downloads and loads the specified models into memory, and then exits the process. You can use this command in an init container to preload the models before starting the main container with the server. Here's an example of how you could use the "preload" command in a Kubernetes pod:

# Define the PodSpec for the main container
containers:
  - name: localai
    image: go-skynet/localai:latest
    command:
      - preload
      - --models=model1.bin,model2.bin
    args:
      - --threads=4
      - --prompt=my_prompt
# Define the PodSpec for the init container
initContainers:
  - name: preload
    image: busybox
    command:
      - sh
      - - |
        # Wait until all download jobs are finished
        inotifywait -r /path/to/models/ -e modify | while read path action file; do echo "Waiting for $file to finish downloading..."; sleep 10; done; echo "All models downloaded."; exit 0

This init container will wait until all download jobs are finished, and then exit with a success code. The main container can then be started as usual with the "preload" command. Note that you may need to adjust the number of threads and prompt configuration based on your specific use case.

Sources: