Open JamborJan opened 8 months ago
It is important to note, that if you are running docker on a VM, you have to ensure that avx2
cpu features are enabled. You can check this with grep avx2 /proc/cpuinfo
. If there is no result, the required features are not available. To solve that you can adjust the hardware settings of the VM and choose the CPU type host
. After that I was able to run the test.
But as I have a GPU installed it would be beneficial to have the GPU used. CPU is spiking to 1600% when a prompt is send.
According to the docs, a different image should be taken in that case.
docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-nvidia-cuda-12
This is discussed here: #26
Upgrade to 8.0 broke my functioning setup, upgrade to 8.1 doesn't change anything. I can get responses from Assistant for text generation, but it appears to be severely limited and doesn't understand the tasks defined. Image generation is entirely non-functional. Running via Docker on Ubuntu.
Hi, can you check if it works now after I changed the docker tag to v2.16.0-aio-cpu with https://github.com/szaimen/aio-local-ai/pull/41 and pushed a new container update?
For me this still does not work with the latest version of everything.
Error logs similar to this:
[90m11:06AM[0m [32mINF[0m [1mLocalAI API is listening! Please connect to the endpoint for API documentation.[0m [36mendpoint=[0mhttp://0.0.0.0:8080 [90m11:07AM[0m [32mINF[0m [1mSuccess[0m [36mip=[0m127.0.0.1 [36mlatency=[0m"201.922µs" [36mmethod=[0mGET [36mstatus=[0m200 [36murl=[0m/readyz [90m11:07AM[0m [32mINF[0m [1mTrying to load the model 'gpt-3.5-turbo' with the backend '[llama-cpp llama-ggml gpt4all llama-cpp-fallback piper rwkv stablediffusion whisper huggingface bert-embeddings /build/backend/python/transformers/run.sh /build/backend/python/vllm/run.sh /build/backend/python/diffusers/run.sh /build/backend/python/exllama/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/rerankers/run.sh /build/backend/python/vall-e-x/run.sh /build/backend/python/autogptq/run.sh /build/backend/python/bark/run.sh /build/backend/python/exllama2/run.sh /build/backend/python/parler-tts/run.sh /build/backend/python/transformers-musicgen/run.sh /build/backend/python/petals/run.sh /build/backend/python/mamba/run.sh /build/backend/python/openvoice/run.sh /build/backend/python/coqui/run.sh]'[0m [90m11:07AM[0m [32mINF[0m [1m[llama-cpp] Attempting to load[0m [90m11:07AM[0m [32mINF[0m [1mLoading model 'gpt-3.5-turbo' with backend llama-cpp[0m [90m11:07AM[0m [32mINF[0m [1m[llama-cpp] attempting to load with AVX2 variant[0m [90m11:07AM[0m [32mINF[0m [1m[llama-cpp] Fails: could not load model: rpc error: code = Canceled desc = [0m
I do have AVX2 on my CPU and the QEMU config is set to 'host'
I have setup the local-ai container as described and downloaded the suggested models in the main readme. Whenever I am running a request through nextclouds ai assistant or via local command line I get this error in the container logs:
When running a test within the container:
I get this error:
I found an issue related to that in the upstream repo: https://github.com/mudler/LocalAI/issues/771#issuecomment-1985588511
As I am not sure where the root cause is, I allow myself to create this issue here, so that others also find that if they encounter this issue.