Open sebastien-prudhomme opened 1 year ago
great idea @sebastien-prudhomme ! totally in favor of it :+1:
Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!
_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.
Don't engage in conversation with me, I don't support (yet) replying!
To preload models in a Kubernetes pod, you can use the "preload" command in LocalAI. This command downloads and loads the specified models into memory, and then exits the process. You can use this command in an init container to preload the models before starting the main container with the server. Here's an example of how you could use the "preload" command in a Kubernetes pod:
# Define the PodSpec for the main container
containers:
- name: localai
image: go-skynet/localai:latest
command:
- preload
- --models=model1.bin,model2.bin
args:
- --threads=4
- --prompt=my_prompt
# Define the PodSpec for the init container
initContainers:
- name: preload
image: busybox
command:
- sh
- - |
# Wait until all download jobs are finished
inotifywait -r /path/to/models/ -e modify | while read path action file; do echo "Waiting for $file to finish downloading..."; sleep 10; done; echo "All models downloaded."; exit 0
This init container will wait until all download jobs are finished, and then exit with a success code. The main container can then be started as usual with the "preload" command. Note that you may need to adjust the number of threads and prompt configuration based on your specific use case.
Sources:
Is your feature request related to a problem? Please describe.
Context:
It's hard to guess the good value for liveness probe timeout, it's really depend on the number of models, their filesize. Without a good value, Kubernetes keeps restarting the pod.
Describe the solution you'd like
An option to launch the local-ai binary for just preloading the models in foreground, not starting the server., and finish when all models are downloaded.
This command could be called in a Kubernetes "init container", launched before the main container, with no liveness probes defined.
Describe alternatives you've considered Writing a shell script that start the server, call the API to load models, watch for the end of the download jobs, stop the server.
Additional context