mudler / LocalAI

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
https://localai.io
MIT License
21.75k stars 1.66k forks source link

localai:v2.10.1-cublas-cuda12-core ”Fails: grpc process not found” #1871

Open testKKP opened 3 months ago

testKKP commented 3 months ago

LocalAI version:

localai:v2.10.1-cublas-cuda12-core Environment, CPU architecture, OS, and Version: windows 11,RTX 4090,cuda 12.3

2024-03-21 19:23:56 11:23AM INF [/build/backend/python/sentencetransformers/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 2024-03-21 19:23:56 11:23AM INF [/build/backend/python/coqui/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/coqui/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS Describe the bug

i just run the command "docker run -ti -p 8080:8080 --gpus all localai/localai:v2.10.1-cublas-cuda12-core mixtral-instruct",and send a test prompt to the 8080 port.then i return the error to me. To Reproduce

the command i send is "curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d "{ \"model\": \"mixtral-instruct\", \"prompt\": \"How are you doing?\" }"" Expected behavior

Logs

Additional context

fakezeta commented 3 months ago

Hi @testKKP can you post your model definition? The core images does not include python so SentenceTransformers and Coqui are not available. For your reference backends that requires python are:

ThaDrone commented 3 months ago

Same issue here, using the new all in one image

 docker run -p 8080:8080 --gpus all --name local-ai -ti quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-12

Request:

{ "model": "gpt-4", "messages": [{"role": "user",     
  "content": "How are you doing?", "temperature": 0.1}]}

Response:

{
    "error": {
        "code": 500,
        "message": "rpc error: code = Unknown desc = unimplemented",
        "type": ""
    }
}

Running Fedora Linux + Nvidia RTX3060

fakezeta commented 3 months ago

Hi @ThaDrone this is a different issue since gpt-4 from AIO images uses llama.cpp backend while it seems that @testKKP is using a python (transformer?) backend. Can you open a new issue with the output from docker logs local-ai?

testKKP commented 3 months ago

Hi @ThaDrone this is a different issue since gpt-4 from AIO images uses llama.cpp backend while it seems that @testKKP is using a python (transformer?) backend. Can you open a new issue with the output from docker logs local-ai?

thansk for you reply.in fact ,i dont know my model definition. i just run the command provided in the official website like this “docker run -ti -p 8080:8080 --gpus all localai/localai:v2.10.1-cublas-cuda12-core mixtral-instruct”. the others were all done by the docker itself.so i dont know how to solve it.

fakezeta commented 3 months ago

There is mixtral configuration in the example directory: https://github.com/mudler/LocalAI/tree/master/examples/configurations/mixtral. Download the files in your models directory with the GGUF file. In the example is used mixtral-8x7b-instruct-v0.1.Q2_K.gguf: you can choose whatever you prefer just remember to update the config.

localai-bot commented 2 months ago

As a bot (experiment of @mudler), I would like to provide some guidelines to resolve the issue:

  1. Make sure Docker is properly installed and running on your system.
  2. Download the mixtral-instruct configuration files from the relevant GGUF link. You can choose any available GGUF file for mixtral-instruct.
  3. Save the downloaded files in the models directory of your LocalAI installation.
  4. Stop the currently running Docker container if it's still running. You can use the command: docker stop local-ai
  5. Run the following command, replacing [gguf_file] with the name of the file you downloaded in step 2:
    docker run -p 8080:8080 --gpus all localai/localai:v2.10.1-cublas-cuda12-core mixtral-instruct-[gguf_file]

Please replace [gguf_file] in the command with the actual filename you downloaded in step 3. If you followed the instructions in the example, the command should look like this:

docker run -p 8080:8080 --gpus all localai/localai:v2.10.1-cublas-cuda12-core mixtral-8x7b-instruct-v0.1.Q2_K.gguf

If you need further assistance, please provide the output of the command and any additional logs while running LocalAI in debug mode (--debug or DEBUG=true).

qingfenghcy commented 2 months ago

Same issue here, using the new all in one image

 docker run -p 8080:8080 --gpus all --name local-ai -ti quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-12

Request:

{ "model": "gpt-4", "messages": [{"role": "user",     
  "content": "How are you doing?", "temperature": 0.1}]}

Response:

{
  "error": {
      "code": 500,
      "message": "rpc error: code = Unknown desc = unimplemented",
      "type": ""
  }
}

Running Fedora Linux + Nvidia RTX3060

Hello, I also had a similar problem, did you solve it?