mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
https://localai.io
MIT License
26.18k stars 1.96k forks source link

ERR Failed starting/connecting to the gRPC service #1721

Closed doug-wade closed 6 months ago

doug-wade commented 9 months ago

LocalAI version: 2.8.2

Environment, CPU architecture, OS, and Version:

» uname -a
Darwin Dougs-MacBook-Air.local 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:33:31 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T8112 arm64

Describe the bug

ERR Failed starting/connecting to the gRPC service: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35053: connect: connection refused"

To Reproduce

» docker run -ti --platform linux/amd64 -p 8080:8080 localai/localai:v2.8.2-ffmpeg-core codellama-7b-gguf
» curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "codellama-7b-gguf",
     "messages": [{"role": "user", "content": "Please write a function that calculates the first n prime numbers."}],
     "temperature": 0.9
   }'

Expected behavior To return a code snippet

Logs

» docker run -ti --platform linux/amd64 -p 8080:8080 localai/localai:v2.8.2-ffmpeg-core codellama-7b-gguf --debug            125 ↵
@@@@@
Skipping rebuild
@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
CPU: no AVX    found
CPU: no AVX2   found
CPU: no AVX512 found
@@@@@
6:55PM DBG no galleries to load
6:55PM INF Starting LocalAI using 4 threads, with models path: /build/models
6:55PM INF LocalAI version: v2.8.2 (e690bf387a27de277368e2f742a616e1b2600d5b)
6:55PM WRN [startup] failed resolving model '--debug'
6:55PM INF Preloading models from /build/models
6:55PM INF Downloading "https://huggingface.co/TheBloke/CodeLlama-7B-GGUF/resolve/main/codellama-7b.Q4_K_M.gguf"
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 321.4 MiB/3.8 GiB (8.26%) ETA: 55.553743882s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 711.8 MiB/3.8 GiB (18.29%) ETA: 44.680439228s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 1.0 GiB/3.8 GiB (27.58%) ETA: 39.397551322s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 1.4 GiB/3.8 GiB (37.36%) ETA: 33.538813589s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 1.8 GiB/3.8 GiB (47.02%) ETA: 28.171281943s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 2.2 GiB/3.8 GiB (56.74%) ETA: 22.877781731s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 2.5 GiB/3.8 GiB (66.86%) ETA: 17.34760848s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 2.9 GiB/3.8 GiB (76.16%) ETA: 12.522402138s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 3.3 GiB/3.8 GiB (86.14%) ETA: 7.240267439s
6:55PM INF Downloading /build/models/232692e1614183192beee756c58afefc.partial: 3.6 GiB/3.8 GiB (95.45%) ETA: 2.38336908s
6:55PM INF File "/build/models/232692e1614183192beee756c58afefc" downloaded and verified
6:55PM INF Model name: codellama-7b-gguf
6:55PM INF Model usage:
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
    "model": "codellama-7b-gguf",
    "prompt": "import socket\n\ndef ping_exponential_backoff(host: str):"
}'

 ┌───────────────────────────────────────────────────┐
 │                   Fiber v2.50.0                   │
 │               http://127.0.0.1:8080               │
 │       (bound on host 0.0.0.0 and port 8080)       │
 │                                                   │
 │ Handlers ............ 73  Processes ........... 1 │
 │ Prefork ....... Disabled  PID ................ 53 │
 └───────────────────────────────────────────────────┘

6:55PM INF Loading model '232692e1614183192beee756c58afefc' with backend transformers
6:56PM ERR Failed starting/connecting to the gRPC service: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:32949: connect: connection refused"

Additional context This is my first time trying to start the project.

doug-wade commented 9 months ago

I followed the instructions for setting it up via docker compose from the community, and I am still getting gRPC service errors:

» curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "codellama-7b-gguf",
     "messages": [{"role": "user", "content": "Please write a function that calculates the first n prime numbers."}],
     "temperature": 0.9
   }'
{"error":{"code":500,"message":"could not load model - all backends returned error: 24 errors occurred:\n\t* grpc service not ready\n\t* could not load model: rpc error: code = Unavailable desc = error reading from server: EOF\n\t* grpc service not ready\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unavailable desc = error reading from server: EOF\n\t* grpc service not ready\n\t* could not load model: rpc error: code = Unknown desc = stat /models/codellama-7b-gguf: no such file or directory\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/tinydream. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* could not load model: rpc error: code = Unknown desc = unsupported model type /models/codellama-7b-gguf (should end with .onnx)\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/bark/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/coqui/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama2/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vllm/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/diffusers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/petals/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/mamba/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers-musicgen/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/autogptq/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vall-e-x/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\n","type":""}}%
doug-wade commented 9 months ago

This seems to be the error that is output when you request a model that does not exist -- I had my model file in the root directory of the project instead of /models, which is where it is configured to search by default. Not sure whether to leave this open for a better error message in this case, or to close it because it seems to be caused by user error.

sesirbu commented 9 months ago

You solved it?, i have the same error

doug-wade commented 9 months ago

@sesirbu I managed to work around it by building from source. I think the trouble is that I'm on Apple Silicon and pulling Docker image for linux/amd64, and there is some incompatibility being surfaced. I followed this guide and the binary runs without these errors.

jonykalavera commented 9 months ago

I am also experiencing this issue.

8:30PM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
8:30PM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
8:30PM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
8:30PM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
8:30PM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
8:30PM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
8:30PM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
8:30PM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
8:30PM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
8:30PM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
8:30PM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
8:30PM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
8:30PM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
8:31PM ERR Failed starting/connecting to the gRPC service: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45031: connect: connection refused"
8:31PM ERR Failed starting/connecting to the gRPC service: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:36203: connect: connection refused"
...

using docker compose with config

version: "3.9"

name: emesabot

networks:
  emesabot:
    driver: host

services:
  localai:
    container_name: localai
    image: localai/localai:v2.6.1-cublas-cuda12-core
    command: llava phi-2 all-minilm-l6-v2
    volumes:
      - "./models:/build/models:cached"
    environment:
      - 'ADDRESS=0.0.0.0:8080'
      - 'GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]'
    ports:
      - "8080:8080"
    networks:
      - emesabot
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  anything-llm:
    container_name: anything-llm
    image: mintplexlabs/anythingllm:latest
    cap_add:
      - SYS_ADMIN
    volumes:
      - "./.env:/app/server/.env"
      - ".anything-llm/server/storage:/app/server/storage"
      - ".anything-llm/collector/hotdir/:/app/collector/hotdir"
      - ".anything-llm/collector/outputs/:/app/collector/outputs"
    user: "${UID:-1000}:${GID:-1000}"
    ports:
      - "3001:3001"
      - "8888:8888"
    env_file:
      - .env
    networks:
      - emesabot
    extra_hosts:
      - "host.docker.internal:host-gateway"
olariuromeo commented 9 months ago

you have to make the gRPC service first if you want to use, exemple:

make GO_TAGS=stablediffusion,tts CUDA_LIBPATH=/usr/local/cuda FMPEG=true BUILD_API_ONLY=false BUILD_TYPE=cuBLAS BUILD_GRPC_FOR_BACKEND_LLAMA=true build

https://localai.io/basics/build/

thevops commented 7 months ago

I have the same error while using Docker image and just a single model (docker-compose):

services:
  api:
    image: localai/localai:v2.12.4-ffmpeg-core
    command:
      - whisper-base