szaimen / aio-local-ai

GNU Affero General Public License v3.0
3 stars 2 forks source link

rpc error code #25

Open JamborJan opened 6 months ago

JamborJan commented 6 months ago

I have setup the local-ai container as described and downloaded the suggested models in the main readme. Whenever I am running a request through nextclouds ai assistant or via local command line I get this error in the container logs:

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43575: connect: connection refused"

When running a test within the container:

LOCALAI=http://localhost:8080

curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "gpt4all-j", 
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 2 
   }'

I get this error:

{"error":{"code":500,"message":"could not load model: rpc error: code = Unknown desc = failed loading model","type":""}}

I found an issue related to that in the upstream repo: https://github.com/mudler/LocalAI/issues/771#issuecomment-1985588511

As I am not sure where the root cause is, I allow myself to create this issue here, so that others also find that if they encounter this issue.

JamborJan commented 6 months ago

It is important to note, that if you are running docker on a VM, you have to ensure that avx2 cpu features are enabled. You can check this with grep avx2 /proc/cpuinfo. If there is no result, the required features are not available. To solve that you can adjust the hardware settings of the VM and choose the CPU type host. After that I was able to run the test.

But as I have a GPU installed it would be beneficial to have the GPU used. CPU is spiking to 1600% when a prompt is send.

According to the docs, a different image should be taken in that case.

docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-nvidia-cuda-12

This is discussed here: #26

bpoulliot commented 5 months ago

Upgrade to 8.0 broke my functioning setup, upgrade to 8.1 doesn't change anything. I can get responses from Assistant for text generation, but it appears to be severely limited and doesn't understand the tasks defined. Image generation is entirely non-functional. Running via Docker on Ubuntu.

szaimen commented 4 months ago

Hi, can you check if it works now after I changed the docker tag to v2.16.0-aio-cpu with https://github.com/szaimen/aio-local-ai/pull/41 and pushed a new container update?

lexiconzero commented 3 months ago

For me this still does not work with the latest version of everything.

Error logs similar to this: 11:06AM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080 11:07AM INF Success ip=127.0.0.1 latency="201.922µs" method=GET status=200 url=/readyz 11:07AM INF Trying to load the model 'gpt-3.5-turbo' with the backend '[llama-cpp llama-ggml gpt4all llama-cpp-fallback piper rwkv stablediffusion whisper huggingface bert-embeddings /build/backend/python/transformers/run.sh /build/backend/python/vllm/run.sh /build/backend/python/diffusers/run.sh /build/backend/python/exllama/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/rerankers/run.sh /build/backend/python/vall-e-x/run.sh /build/backend/python/autogptq/run.sh /build/backend/python/bark/run.sh /build/backend/python/exllama2/run.sh /build/backend/python/parler-tts/run.sh /build/backend/python/transformers-musicgen/run.sh /build/backend/python/petals/run.sh /build/backend/python/mamba/run.sh /build/backend/python/openvoice/run.sh /build/backend/python/coqui/run.sh]' 11:07AM INF [llama-cpp] Attempting to load 11:07AM INF Loading model 'gpt-3.5-turbo' with backend llama-cpp 11:07AM INF [llama-cpp] attempting to load with AVX2 variant 11:07AM INF [llama-cpp] Fails: could not load model: rpc error: code = Canceled desc = 

I do have AVX2 on my CPU and the QEMU config is set to 'host'