mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
23.13k stars 1.75k forks source link

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1">127.0.0.1:37785: connect: connection refused #771

Closed luoweb closed 5 months ago

luoweb commented 1 year ago

LocalAI version:

V1.21 root@63429046747f:/build# ./local-ai --version LocalAI version 4548473 (4548473acf4f57ff149492272cc1fdba3521f83a) llmai-api-1 | 3:04AM DBG Loading model '

Environment, CPU architecture, OS, and Version: Intel X86 centos

Describe the bug Grpc error

To Reproduce

Expected behavior Output response

Logs openllama7b' greedly llmai-api-1 | 3:04AM DBG [llama] Attempting to load llmai-api-1 | 3:04AM DBG Loading model llama from openllama7b llmai-api-1 | 3:04AM DBG Loading model in memory from file: /models/openllama7b llmai-api-1 | 3:04AM DBG Loading GRPC Model%!(EXTRA string=llama, model.Options={llama openllama7b 4 /tmp/localai/backend_data 0xc0000400b0 0xc000296a20}) llmai-api-1 | 3:04AM DBG Loading GRPC Process%!(EXTRA string=/tmp/localai/backend_data/backend-assets/grpc/llama) llmai-api-1 | 3:04AM DBG GRPC Service for 'llama' (openllama7b) will be running at: 'localhost:37785' llmai-api-1 | 3:04AM DBG GRPC Service Started llmai-api-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1">127.0.0.1:37785: connect: connection refused" llmai-api-1 | 3:04AM DBG GRPC(llama-openllama7b-localhost:37785): stderr 2023/07/19 03:04:00 gRPC Server listening at 127.0.0.1:37785 llmai-api-1 | 3:04AM DBG GRPC Service Ready llmai-api-1 | 3:04AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:/models/openllama7b ContextSize:512 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:}

Additional context

JamborJan commented 6 months ago

I am using nextcloud all in one and the Local AI container. I used the basic example there for the models:

# Stable Diffusion in NCNN with c++, supported txt2img and img2img 
- url: github:go-skynet/model-gallery/stablediffusion.yaml

# Port of OpenAI's Whisper model in C/C++ 
- url: github:go-skynet/model-gallery/whisper-base.yaml
  name: whisper-1

# A commercially licensable model based on GPT-J and trained by Nomic AI on the v0 GPT4All dataset.
- url: github:go-skynet/model-gallery/gpt4all-j.yaml
  name: gpt4all-j

The container is up and running with these logs:

CPU info:
model name  : Common KVM processor
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_good nopl cpuid extd_apicid tsc_known_freq pni cx16 x2apic hypervisor cmp_legacy 3dnowprefetch vmmcall
CPU: no AVX    found
CPU: no AVX2   found
CPU: no AVX512 found
nc: getaddrinfo for host "nextcloud-aio-nextcloud" port 9001: Name or service not known
Waiting for nextcloud to start
nc: getaddrinfo for host "nextcloud-aio-nextcloud" port 9001: Name or service not known
Waiting for nextcloud to start
nc: getaddrinfo for host "nextcloud-aio-nextcloud" port 9001: Name or service not known
Waiting for nextcloud to start
nc: getaddrinfo for host "nextcloud-aio-nextcloud" port 9001: Name or service not known
Waiting for nextcloud to start
nc: getaddrinfo for host "nextcloud-aio-nextcloud" port 9001: Name or service not known
Waiting for nextcloud to start
Waiting for nextcloud to start
Waiting for nextcloud to start
Waiting for nextcloud to start
Waiting for nextcloud to start
Waiting for nextcloud to start
Waiting for nextcloud to start
++ nproc
+ THREADS=16
+ export THREADS
+ set +x
10:55AM INF Starting LocalAI using 16 threads, with models path: /models
10:55AM INF LocalAI version: v1.30.0 (274ace289823a8bacb7b4987b5c961b62d5eee99)
 ┌───────────────────────────────────────────────────┐ 
 │                   Fiber v2.49.2                   │ 
 │               http://127.0.0.1:8080               │ 
 │       (bound on host 0.0.0.0 and port 8080)       │ 
 │                                                   │ 
 │ Handlers ............ 70  Processes ........... 1 │ 
 │ Prefork ....... Disabled  PID ................ 39 │ 
 └───────────────────────────────────────────────────┘

When I run the test mentioned earlier in the threads:

LOCALAI=http://localhost:8080

curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "gpt4all-j", 
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 2 
   }'

The command produces this error:

{"error":{"code":500,"message":"could not load model: rpc error: code = Unknown desc = failed loading model","type":""}}

I get this error always when a request is either send via nextcloud ai assistant or the test curl command:

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42737: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41939: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45029: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:32875: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:32875: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:32875: connect: connection refused"
iammaguire commented 6 months ago

Having this issue as well. Are any devs looking at this?

kilmarnock commented 5 months ago

Same issue here.

For me, the main issue seems to be stderr gguf_init_from_file: invalid magic characters 'lmgg'

The file is not in gguf format (I am no expert).

I try to load the model with the llama-cpp backend.

Full Trace:

4:46PM DBG Loading Model ggml-gpt4all-j.bin with gRPC (file: /models/ggml-gpt4all-j.bin) (backend: llama-cpp): {backendString:llama model:ggml-gpt4all-j.bin threads:6 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004be000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:true parallelRequests:false}
4:46PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp
4:46PM DBG GRPC Service for ggml-gpt4all-j.bin will be running at: '127.0.0.1:37527'
4:46PM DBG GRPC Service state dir: /tmp/go-processmanager1594555815
4:46PM DBG GRPC Service Started
4:46PM DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:37527): stdout Server listening on 127.0.0.1:37527
4:46PM DBG GRPC Service Ready
4:46PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:ggml-gpt4all-j.bin ContextSize:0 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:true NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:6 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/ggml-gpt4all-j.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
4:46PM DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:37527): stderr ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
4:46PM DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:37527): stderr ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
4:46PM DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:37527): stderr ggml_init_cublas: found 1 CUDA devices:
4:46PM DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:37527): stderr   Device 0: NVIDIA GeForce GTX 1070, compute capability 6.1, VMM: yes
4:46PM DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:37527): stderr gguf_init_from_file: invalid magic characters 'lmgg'
4:46PM DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:37527): stderr llama_model_load: error loading model: llama_model_loader: failed to load model from /models/ggml-gpt4all-j.bin
4:46PM DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:37527): stderr 
4:46PM DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:37527): stderr llama_load_model_from_file: failed to load model
4:46PM DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:37527): stderr llama_init_from_gpt_params: error: failed to load model '/models/ggml-gpt4all-j.bin'
[172.27.0.1]:43202 500 - POST /v1/embeddings`
SicLuceatLux commented 5 months ago

why is it completed ? People still have the issue me included

JamborJan commented 5 months ago

I have tested with the latest LocalAI version bundled with the nextcloud aoi 8.0.0 mastercontainer. The difference now is that I don't get the rpc error: code error in the container but now see Loading model 'ggml-gpt4all-j.bin' with backend gpt4all-j in the container logs.

++ nproc
+ THREADS=16
+ export THREADS
+ set +x
2:02PM DBG no galleries to load
2:02PM INF Starting LocalAI using 16 threads, with models path: /models
2:02PM INF LocalAI version: v2.9.0 (ff88c390bb51d9567572815a63c575eb2e3dd062)
2:02PM INF Preloading models from /models
2:02PM INF Model name: gpt4all-j
2:02PM INF Model name: stablediffusion
2:02PM INF Model name: whisper-1
 ┌───────────────────────────────────────────────────┐ 
 │                   Fiber v2.50.0                   │ 
 │               http://127.0.0.1:8080               │ 
 │       (bound on host 0.0.0.0 and port 8080)       │ 
 │                                                   │ 
 │ Handlers ........... 105  Processes ........... 1 │ 
 │ Prefork ....... Disabled  PID ............... 184 │ 
 └───────────────────────────────────────────────────┘ 
2:04PM INF Loading model 'ggml-gpt4all-j.bin' with backend gpt4all-j
2:06PM INF Loading model 'ggml-gpt4all-j.bin' with backend gpt4all-j
2:07PM INF Loading model 'ggml-gpt4all-j.bin' with backend gpt4all-j

But when testing within the container, nothing changed. I run this:

LOCALAI=http://localhost:8080

curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "gpt4all-j", 
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 2 
   }'

and I get this:

{"error":{"code":500,"message":"could not load model: rpc error: code = Unknown desc = failed loading model","type":""}}

When changing the model I get the error again in the container logs:

2:43PM INF Loading model 'ggml-whisper-base.bin' with backend whisper
2:44PM ERR Failed starting/connecting to the gRPC service: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35531: connect: connection refused"

So also for me closing the issue is not the thing we should do.

Punkado commented 5 months ago

I think i might found the problem, at least for my case.

I was trying, since 2 days, to run in a VM in the server with a old CPU, that not support AVX2, when i tried in my desktop and in other server, that have AVX2, it worked without any problem.

I recommend to people to test it, as i did, if you are running in a VM, don't forget to set the CPU type as "host"

Is it a easy way to run localAI in docker without needing AVX2?

JamborJan commented 5 months ago

Thank you @Punkado for your tests. I was using the whole setup with Docker in my mind and specifically the setup together with nextcloud. The VM where docker is running in my case was not AVX2 enabled (grep avx2 /proc/cpuinfo returned nothing). Now I adjusted the VM where docker is running in and I am passing the CPU in Host mode.

From inside the local ai container I can now run:

curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "gpt4all-j", 
     "messages": [{"role": "user", "content": "Do you know the city of Lucerne in Switzerland?"}],
     "temperature": 2 
   }'
{"created":1712122678,"object":"chat.completion","id":"ce4eccd9-f86d-417c-9314-96955cbcb5c9","model":"gpt4all-j","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"As an AI language model, I do not physically go places. Therefore, answering the above prompt I don't think would benefit any specific country, town, city, or the like; but based on a fact of Lucerne in Switzerland being located north at a high altibank to the summit it is possible to respond - I have heard about Lucesne or Lucern from tourists that visit Swiss alps. Some travel companies sell itineraries, maps with the name of such small mountain town and nearby villages. Also, a more detailed question is how to get into Swiss alps and where it could cost to pass by? It depends on the route followed, whether it is easy to go by train or not with which I can say there aren't enough services of this way inside or around the Swiss highlands area due to infrastructure and population distribution in that remote mountainous region, which means train travels take some more money or a day's extra travel by the road could have been possible.  So answering \"yes\" would be the appropriate choice and a fair response at this time."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

Nextcloud is still not working but I guess that's another thing.

So long story short: you are awesome @Punkado ! Thanks for pointing that out.

I did not find anything in the docs when looking for avx2. Not sure if @mudler can enhance that.