LocalAI version: v2.12.4-aio-gpu-nvidia-cuda-12

Environment, CPU architecture, OS, and Version: Linux giancubuntu 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux VM using Proxmox nvidia geforce rtx 4060 ti

the environment works correctly using for example llama.cpp compiled for cuda

Describe the bug docker-compose up using sample in getting started in log i see the error

To Reproduce

cat docker-compose.yml

services: api:

image: localai/localai:latest-aio-cpu

# For a specific version:
# image: localai/localai:v2.12.4-aio-cpu
# For Nvidia GPUs decomment one of the following (cuda11 or cuda12):
# image: localai/localai:v2.12.4-aio-gpu-nvidia-cuda-11
image: localai/localai:v2.12.4-aio-gpu-nvidia-cuda-12
# image: localai/localai:latest-aio-gpu-nvidia-cuda-11
# image: localai/localai:latest-aio-gpu-nvidia-cuda-12
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
  interval: 1m
  timeout: 20m
  retries: 5
ports:
  - 8080:8080
environment:
  - DEBUG=true
  # ...
volumes:
  - ./models:/build/models:cached
# decomment the following piece if running with Nvidia GPUs
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

docker-compose up

in another terminal run:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }'

i got this error {"error":{"code":500,"message":"rpc error: code = Unknown desc = unimplemented","type":""}}

Expected behavior

the curl command should response to my question

Logs in docker log i see

Attaching to api-1 api-1 | ===> LocalAI All-in-One (AIO) container starting... api-1 | NVIDIA GPU detected api-1 | Sun Apr 21 21:19:56 2024
api-1 | +-----------------------------------------------------------------------------------------+ api-1 | | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | api-1 | |-----------------------------------------+------------------------+----------------------+ api-1 | | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | api-1 | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | api-1 | | | | MIG M. | api-1 | |=========================================+========================+======================| api-1 | | 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:00:10.0 Off | N/A | api-1 | | 0% 32C P8 8W / 165W | 1MiB / 16380MiB | 0% Default | api-1 | | | | N/A | api-1 | +-----------------------------------------+------------------------+----------------------+ api-1 |
api-1 | +-----------------------------------------------------------------------------------------+ api-1 | | Processes: | api-1 | | GPU GI CI PID Type Process name GPU Memory | api-1 | | ID ID Usage | api-1 | |=========================================================================================| api-1 | | No running processes found | api-1 | +-----------------------------------------------------------------------------------------+ api-1 | NVIDIA GPU detected. Attempting to find memory size... api-1 | Total GPU Memory: 16380 MiB api-1 | ===> Starting LocalAI[gpu-8g] with the following models: /aio/gpu-8g/embeddings.yaml,/aio/gpu-8g/text-to-speech.yaml,/aio/gpu-8g/image-gen.yaml,/aio/gpu-8g/text-to-text.yaml,/aio/gpu-8g/speech-to-text.yaml,/aio/gpu-8g/vision.yaml api-1 | @@@@@ api-1 | Skipping rebuild api-1 | @@@@@ api-1 | If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true api-1 | If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed: api-1 | CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" api-1 | see the documentation at: https://localai.io/basics/build/index.html api-1 | Note: See also https://github.com/go-skynet/LocalAI/issues/288 api-1 | @@@@@ api-1 | CPU info: api-1 | model name : QEMU Virtual CPU version 2.5+ api-1 | flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc nopl xtopology cpuid tsc_known_freq pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm cpuid_fault pti api-1 | CPU: no AVX found api-1 | CPU: no AVX2 found api-1 | CPU: no AVX512 found api-1 | @@@@@ api-1 | 9:19PM INF Starting LocalAI using 4 threads, with models path: /build/models api-1 | 9:19PM INF LocalAI version: v2.12.4 (0004ec8be3ca150ce6d8b79f2991bfe3a9dc65ad) api-1 | 9:19PM DBG [startup] resolved local model: /aio/gpu-8g/embeddings.yaml api-1 | 9:19PM DBG [startup] resolved local model: /aio/gpu-8g/text-to-speech.yaml api-1 | 9:19PM DBG [startup] resolved local model: /aio/gpu-8g/image-gen.yaml api-1 | 9:19PM DBG [startup] resolved local model: /aio/gpu-8g/text-to-text.yaml api-1 | 9:19PM DBG [startup] resolved local model: /aio/gpu-8g/speech-to-text.yaml api-1 | 9:19PM DBG [startup] resolved local model: /aio/gpu-8g/vision.yaml api-1 | 9:19PM INF Preloading models from /build/models api-1 | 9:19PM DBG Checking "DreamShaper_8_pruned.safetensors" exists and matches SHA api-1 | 9:19PM INF Downloading "https://huggingface.co/Lykon/DreamShaper/resolve/main/DreamShaper_8_pruned.safetensors" api-1 | 9:20PM INF Downloading /build/models/DreamShaper_8_pruned.safetensors.partial: 491.8 MiB/2.0 GiB (24.18%) ETA: 15.679161447s api-1 | 9:20PM INF Downloading /build/models/DreamShaper_8_pruned.safetensors.partial: 1.0 GiB/2.0 GiB (51.31%) ETA: 9.491204298s api-1 | 9:20PM INF Downloading /build/models/DreamShaper_8_pruned.safetensors.partial: 1.5 GiB/2.0 GiB (73.43%) ETA: 5.429262496s api-1 | 9:20PM DBG SHA missing for "/build/models/DreamShaper_8_pruned.safetensors". Skipping validation api-1 | 9:20PM INF File "/build/models/DreamShaper_8_pruned.safetensors" downloaded and verified api-1 | api-1 | Model name: stablediffusion
api-1 | api-1 | api-1 | api-1 | curl http://localhost:8080/v1/images/generations -H "Content-Type:
api-1 | application/json" -d '{ "prompt": "|", "step": 25, "size": "512x512" }'
api-1 | api-1 | api-1 | 9:20PM DBG Checking "llava-v1.6-mistral-7b.Q5_K_M.gguf" exists and matches SHA api-1 | 9:20PM INF Downloading "https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/llava-v1.6-mistral-7b.Q5_K_M.gguf" api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 8.0 KiB/4.8 GiB (0.00%) ETA: 3517h41m8.95587088s api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 550.0 MiB/4.8 GiB (11.24%) ETA: 3m19.165169205s api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 1.1 GiB/4.8 GiB (22.25%) ETA: 1m45.601816652s api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 1.6 GiB/4.8 GiB (33.38%) ETA: 1m10.298108801s api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 2.1 GiB/4.8 GiB (44.67%) ETA: 49.814411196s api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 2.6 GiB/4.8 GiB (55.38%) ETA: 36.431612798s api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 3.2 GiB/4.8 GiB (66.60%) ETA: 25.190075009s api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 3.7 GiB/4.8 GiB (77.89%) ETA: 15.679418344s api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 4.2 GiB/4.8 GiB (88.68%) ETA: 7.68817045s api-1 | 9:21PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 4.8 GiB/4.8 GiB (99.98%) ETA: 10.002964ms api-1 | 9:21PM DBG SHA missing for "/build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf". Skipping validation api-1 | 9:21PM INF File "/build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf" downloaded and verified api-1 | 9:21PM DBG Checking "llava-v1.6-7b-mmproj-f16.gguf" exists and matches SHA api-1 | 9:21PM INF Downloading "https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/mmproj-model-f16.gguf" api-1 | 9:21PM INF Downloading /build/models/llava-v1.6-7b-mmproj-f16.gguf.partial: 519.5 MiB/595.5 MiB (87.24%) ETA: 10.276415747s api-1 | 9:21PM DBG SHA missing for "/build/models/llava-v1.6-7b-mmproj-f16.gguf". Skipping validation api-1 | 9:21PM INF File "/build/models/llava-v1.6-7b-mmproj-f16.gguf" downloaded and verified api-1 | api-1 | Model name: gpt-4-vision-preview
api-1 | api-1 | api-1 | api-1 | curl http://localhost:8080/v1/chat/completions -H "Content-Type:
api-1 | application/json" -d '{ "model": "gpt-4-vision-preview", "messages": [{"role": api-1 | "user", "content": [{"type":"text", "text": "What is in the image?"},
api-1 | {"type": "image_url", "image_url": {"url":
api-1 | "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-
api-1 | madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-
api-1 | boardwalk.jpg" }}], "temperature": 0.9}]}'
api-1 | api-1 | api-1 | 9:21PM DBG Checking "voice-en-us-amy-low.tar.gz" exists and matches SHA api-1 | 9:21PM INF Downloading "https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-en-us-amy-low.tar.gz" api-1 | 9:21PM DBG SHA missing for "/build/models/voice-en-us-amy-low.tar.gz". Skipping validation api-1 | 9:21PM INF File "/build/models/voice-en-us-amy-low.tar.gz" downloaded and verified api-1 | 9:21PM INF File "/build/models/voice-en-us-amy-low.tar.gz" is an archive, uncompressing to /build/models api-1 | api-1 | Model name: tts-1
api-1 | api-1 | api-1 | api-1 | To test if this model works as expected, you can use the following curl
api-1 | command:
api-1 |
api-1 | curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
api-1 | "model":"tts-1", "input": "Hi, this is a test." }'
api-1 | api-1 | api-1 | api-1 | Model name: text-embedding-ada-002
api-1 | api-1 | api-1 | 9:21PM INF Downloading "https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q6_K.gguf" api-1 | api-1 | You can test this model with curl like this:
api-1 |
api-1 | curl http://localhost:8080/embeddings -X POST -H "Content-Type:
api-1 | application/json" -d '{ "input": "Your text string goes here", "model": "text- api-1 | embedding-ada-002" }'
api-1 | api-1 | api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 282.3 MiB/5.5 GiB (4.98%) ETA: 23m55.288008036s api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 812.5 MiB/5.5 GiB (14.34%) ETA: 7m59.416657726s api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 1.3 GiB/5.5 GiB (24.07%) ETA: 4m28.900725495s api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 1.9 GiB/5.5 GiB (33.82%) ETA: 2m56.56875477s api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 2.4 GiB/5.5 GiB (43.16%) ETA: 2m5.413705381s api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 2.9 GiB/5.5 GiB (52.89%) ETA: 1m29.305942754s api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 3.5 GiB/5.5 GiB (62.66%) ETA: 1m2.70974332s api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 4.0 GiB/5.5 GiB (72.03%) ETA: 42.818593305s api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 4.5 GiB/5.5 GiB (81.77%) ETA: 25.701652455s api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 5.0 GiB/5.5 GiB (91.13%) ETA: 11.697674152s api-1 | 9:22PM DBG SHA missing for "/build/models/5c7cd056ecf9a4bb5b527410b97f48cb". Skipping validation api-1 | 9:22PM INF File "/build/models/5c7cd056ecf9a4bb5b527410b97f48cb" downloaded and verified api-1 | api-1 | Model name: gpt-4
api-1 | api-1 | api-1 | api-1 | curl http://localhost:8080/v1/chat/completions -H "Content-Type:
api-1 | application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user",
api-1 | "content": "How are you doing?", "temperature": 0.1}] }'
api-1 | api-1 | api-1 | 9:22PM DBG Checking "ggml-whisper-base.bin" exists and matches SHA api-1 | 9:22PM INF Downloading "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin" api-1 | 9:22PM INF Downloading /build/models/ggml-whisper-base.bin.partial: 14.3 MiB/141.1 MiB (10.12%) ETA: 18m32.413687204s api-1 | 9:22PM INF File "/build/models/ggml-whisper-base.bin" downloaded and verified api-1 | api-1 | Model name: whisper-1
api-1 | api-1 | api-1 | api-1 | ## example audio file
api-1 |
api-1 | wget --quiet --show-progress -O gb1.ogg
api-1 | https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg api-1 |
api-1 | ## Send the example audio file to the transcriptions endpoint
api-1 |
api-1 | curl http://localhost:8080/v1/audio/transcriptions -H "Content-Type:
api-1 | multipart/form-data" -F file="@$PWD/gb1.ogg" -F model="whisper-1"
api-1 | api-1 | api-1 | 9:22PM DBG Model: gpt-4-vision-preview (config: {PredictionOptions:{Model:llava-v1.6-mistral-7b.Q5_K_M.gguf Language: N:0 TopP:0xc0003473b0 TopK:0xc0003473a8 Temperature:0xc000347388 Maxtokens:0xc000347420 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000347448 TypicalP:0xc000347440 Seed:0xc0003473d0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-4-vision-preview F16:0xc000347380 Threads:0xc0003473f8 Debug:0xc000347458 Roles:map[assistant:ASSISTANT: system:SYSTEM: user:USER:] Embeddings:false Backend:llama-cpp TemplateConfig:{Chat:A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. api-1 | {{.Input}} api-1 | ASSISTANT: api-1 | ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000347438 MirostatTAU:0xc000347430 Mirostat:0xc000347428 NGPULayers:0xc000347450 MMap:0xc000347381 MMlock:0xc000347459 LowVRAM:0xc000347459 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000347370 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj:llava-v1.6-7b-mmproj-f16.gguf RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[{Filename:llava-v1.6-mistral-7b.Q5_K_M.gguf SHA256: URI:huggingface://cjpais/llava-1.6-mistral-7b-gguf/llava-v1.6-mistral-7b.Q5_K_M.gguf} {Filename:llava-v1.6-7b-mmproj-f16.gguf SHA256: URI:huggingface://cjpais/llava-1.6-mistral-7b-gguf/mmproj-model-f16.gguf}] Description: Usage:curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ api-1 | "model": "gpt-4-vision-preview", api-1 | "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}' api-1 | }) api-1 | 9:22PM DBG Model: tts-1 (config: {PredictionOptions:{Model:en-us-amy-low.onnx Language: N:0 TopP:0xc000347538 TopK:0xc000347540 Temperature:0xc000347548 Maxtokens:0xc000347550 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000347578 TypicalP:0xc000347570 Seed:0xc000347590 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:tts-1 F16:0xc000347530 Threads:0xc000347528 Debug:0xc000347588 Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000347568 MirostatTAU:0xc000347560 Mirostat:0xc000347558 NGPULayers:0xc000347580 MMap:0xc000347588 MMlock:0xc000347589 LowVRAM:0xc000347589 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000347520 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[{Filename:voice-en-us-amy-low.tar.gz SHA256: URI:https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-en-us-amy-low.tar.gz}] Description: Usage:To test if this model works as expected, you can use the following curl command: api-1 | api-1 | curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{ api-1 | "model":"tts-1", api-1 | "input": "Hi, this is a test." api-1 | }'}) api-1 | 9:22PM DBG Model: text-embedding-ada-002 (config: {PredictionOptions:{Model:all-MiniLM-L6-v2 Language: N:0 TopP:0xc000346780 TopK:0xc000346788 Temperature:0xc000346790 Maxtokens:0xc000346798 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc0003467c0 TypicalP:0xc0003467b8 Seed:0xc0003467d8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:text-embedding-ada-002 F16:0xc000346778 Threads:0xc000346770 Debug:0xc0003467d0 Roles:map[] Embeddings:false Backend:sentencetransformers TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc0003467b0 MirostatTAU:0xc0003467a8 Mirostat:0xc0003467a0 NGPULayers:0xc0003467c8 MMap:0xc0003467d0 MMlock:0xc0003467d1 LowVRAM:0xc0003467d1 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000346768 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:You can test this model with curl like this: api-1 | api-1 | curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{ api-1 | "input": "Your text string goes here", api-1 | "model": "text-embedding-ada-002" api-1 | }'}) api-1 | 9:22PM DBG Model: gpt-4 (config: {PredictionOptions:{Model:5c7cd056ecf9a4bb5b527410b97f48cb Language: N:0 TopP:0xc000346a50 TopK:0xc000346a58 Temperature:0xc000346a60 Maxtokens:0xc000346a68 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000346aa0 TypicalP:0xc000346a88 Seed:0xc000346ab8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-4 F16:0xc000346a10 Threads:0xc000346a20 Debug:0xc000346ab0 Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat:{{.Input -}} api-1 | <|im_start|>assistant api-1 | ChatMessage:<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}} api-1 | {{- if .FunctionCall }}{{end}} api-1 | {{- if eq .RoleName "tool" }}{{end }} api-1 | {{- if .Content}} api-1 | {{.Content}} api-1 | {{- end }} api-1 | {{- if .FunctionCall}}{{toJson .FunctionCall}}{{end }} api-1 | {{- if .FunctionCall }}{{end }} api-1 | {{- if eq .RoleName "tool" }}{{end }} api-1 | <|im_end|> api-1 | Completion:{{.Input}} api-1 | Edit: Functions:<|im_start|>system api-1 | You are a function calling AI model. You are provided with function signatures within XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: api-1 | api-1 | {{range .Functions}} api-1 | {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }} api-1 | {{end}} api-1 | api-1 | Use the following pydantic model json schema for each tool call you will make: api-1 | {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']} api-1 | For each function call return a json object with function name and arguments within XML tags as follows: api-1 | api-1 | {'arguments': , 'name': } api-1 | api-1 | <|im_end|> api-1 | {{.Input -}} api-1 | <|im_start|>assistant api-1 | api-1 | } PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000346a80 MirostatTAU:0xc000346a78 Mirostat:0xc000346a70 NGPULayers:0xc000346aa8 MMap:0xc00034699d MMlock:0xc000346ab1 LowVRAM:0xc000346ab1 Grammar: StopWords:[<|im_end|> api-1 | api-1 | api-1 | api-1 | ] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc0003469d0 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ api-1 | "model": "gpt-4", api-1 | "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] api-1 | }' api-1 | }) api-1 | 9:22PM DBG Model: whisper-1 (config: {PredictionOptions:{Model:ggml-whisper-base.bin Language: N:0 TopP:0xc000346c28 TopK:0xc000346c30 Temperature:0xc000346c38 Maxtokens:0xc000346c40 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000346c98 TypicalP:0xc000346c90 Seed:0xc000346cf0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:whisper-1 F16:0xc000346c20 Threads:0xc000346c18 Debug:0xc000346ce8 Roles:map[] Embeddings:false Backend:whisper TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000346c58 MirostatTAU:0xc000346c50 Mirostat:0xc000346c48 NGPULayers:0xc000346ce0 MMap:0xc000346ce8 MMlock:0xc000346ce9 LowVRAM:0xc000346ce9 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000346c10 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[{Filename:ggml-whisper-base.bin SHA256:60ed5bc3dd14eea856493d334349b405782ddcaf0028d4b5df4088345fba2efe URI:https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin}] Description: Usage:## example audio file api-1 | wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg api-1 | api-1 | ## Send the example audio file to the transcriptions endpoint api-1 | curl http://localhost:8080/v1/audio/transcriptions \ api-1 | -H "Content-Type: multipart/form-data" \ api-1 | -F file="@$PWD/gb1.ogg" -F model="whisper-1" api-1 | }) api-1 | 9:22PM DBG Model: stablediffusion (config: {PredictionOptions:{Model:DreamShaper_8_pruned.safetensors Language: N:0 TopP:0xc000347008 TopK:0xc000347010 Temperature:0xc000347018 Maxtokens:0xc000347020 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000347048 TypicalP:0xc000347040 Seed:0xc000347080 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:stablediffusion F16:0xc000346f85 Threads:0xc000346ff8 Debug:0xc000347058 Roles:map[] Embeddings:false Backend:diffusers TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000347038 MirostatTAU:0xc000347030 Mirostat:0xc000347028 NGPULayers:0xc000347050 MMap:0xc000347058 MMlock:0xc000347059 LowVRAM:0xc000347059 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000346ff0 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:true PipelineType:StableDiffusionPipeline SchedulerType:k_dpmpp_2m EnableParameters:negative_prompt,num_inference_steps CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:25 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[{Filename:DreamShaper_8_pruned.safetensors SHA256: URI:huggingface://Lykon/DreamShaper/DreamShaper_8_pruned.safetensors}] Description: Usage:curl http://localhost:8080/v1/images/generations \ api-1 | -H "Content-Type: application/json" \ api-1 | -d '{ api-1 | "prompt": "|", api-1 | "step": 25, api-1 | "size": "512x512" api-1 | }'}) api-1 | 9:22PM DBG Extracting backend assets files to /tmp/localai/backend_data api-1 | 9:22PM INF core/startup process completed! api-1 | 9:22PM DBG No configuration file found at /tmp/localai/upload/uploadedFiles.json api-1 | 9:22PM DBG No configuration file found at /tmp/localai/config/assistants.json api-1 | 9:22PM DBG No configuration file found at /tmp/localai/config/assistantsFile.json api-1 | api-1 | ┌───────────────────────────────────────────────────┐ api-1 | │ Fiber v2.52.0 │ api-1 | │ http://127.0.0.1:8080 │ api-1 | │ (bound on host 0.0.0.0 and port 8080) │ api-1 | │ │ api-1 | │ Handlers ........... 181 Processes ........... 1 │ api-1 | │ Prefork ....... Disabled PID ................. 1 │ api-1 | └───────────────────────────────────────────────────┘ api-1 | api-1 | [127.0.0.1]:59222 200 - GET /readyz

api-1 | [127.0.0.1]:41692 200 - GET /readyz api-1 | [127.0.0.1]:46284 200 - GET /readyz api-1 | 9:25PM DBG Request received: {"model":"gpt-4","language":"","n":0,"top_p":null,"top_k":null,"temperature":null,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","response_format":{},"size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"How are you doing?"}],"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"backend":"","model_base_name":""} api-1 | 9:25PM DBG Configuration read: &{PredictionOptions:{Model:5c7cd056ecf9a4bb5b527410b97f48cb Language: N:0 TopP:0xc000346a50 TopK:0xc000346a58 Temperature:0xc000346a60 Maxtokens:0xc000346a68 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000346aa0 TypicalP:0xc000346a88 Seed:0xc000346ab8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-4 F16:0xc000346a10 Threads:0xc000346a20 Debug:0xc00015bc98 Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat:{{.Input -}} api-1 | <|im_start|>assistant api-1 | ChatMessage:<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}} api-1 | {{- if .FunctionCall }}{{end}} api-1 | {{- if eq .RoleName "tool" }}{{end }} api-1 | {{- if .Content}} api-1 | {{.Content}} api-1 | {{- end }} api-1 | {{- if .FunctionCall}}{{toJson .FunctionCall}}{{end }} api-1 | {{- if .FunctionCall }}{{end }} api-1 | {{- if eq .RoleName "tool" }}{{end }} api-1 | <|im_end|> api-1 | Completion:{{.Input}} api-1 | Edit: Functions:<|im_start|>system api-1 | You are a function calling AI model. You are provided with function signatures within XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: api-1 | api-1 | {{range .Functions}} api-1 | {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }} api-1 | {{end}} api-1 | api-1 | Use the following pydantic model json schema for each tool call you will make: api-1 | {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']} api-1 | For each function call return a json object with function name and arguments within XML tags as follows: api-1 | api-1 | {'arguments': , 'name': } api-1 | api-1 | <|im_end|> api-1 | {{.Input -}} api-1 | <|im_start|>assistant api-1 | api-1 | } PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000346a80 MirostatTAU:0xc000346a78 Mirostat:0xc000346a70 NGPULayers:0xc000346aa8 MMap:0xc00034699d MMlock:0xc000346ab1 LowVRAM:0xc000346ab1 Grammar: StopWords:[<|im_end|> api-1 | api-1 | api-1 | api-1 | ] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc0003469d0 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ api-1 | "model": "gpt-4", api-1 | "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] api-1 | }' api-1 | } api-1 | 9:25PM DBG Parameters: &{PredictionOptions:{Model:5c7cd056ecf9a4bb5b527410b97f48cb Language: N:0 TopP:0xc000346a50 TopK:0xc000346a58 Temperature:0xc000346a60 Maxtokens:0xc000346a68 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000346aa0 TypicalP:0xc000346a88 Seed:0xc000346ab8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-4 F16:0xc000346a10 Threads:0xc000346a20 Debug:0xc00015bc98 Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat:{{.Input -}} api-1 | <|im_start|>assistant api-1 | ChatMessage:<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}} api-1 | {{- if .FunctionCall }}{{end}} api-1 | {{- if eq .RoleName "tool" }}{{end }} api-1 | {{- if .Content}} api-1 | {{.Content}} api-1 | {{- end }} api-1 | {{- if .FunctionCall}}{{toJson .FunctionCall}}{{end }} api-1 | {{- if .FunctionCall }}{{end }} api-1 | {{- if eq .RoleName "tool" }}{{end }} api-1 | <|im_end|> api-1 | Completion:{{.Input}} api-1 | Edit: Functions:<|im_start|>system api-1 | You are a function calling AI model. You are provided with function signatures within XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: api-1 | api-1 | {{range .Functions}} api-1 | {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }} api-1 | {{end}} api-1 | api-1 | Use the following pydantic model json schema for each tool call you will make: api-1 | {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']} api-1 | For each function call return a json object with function name and arguments within XML tags as follows: api-1 | api-1 | {'arguments': , 'name': } api-1 | api-1 | <|im_end|> api-1 | {{.Input -}} api-1 | <|im_start|>assistant api-1 | api-1 | } PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000346a80 MirostatTAU:0xc000346a78 Mirostat:0xc000346a70 NGPULayers:0xc000346aa8 MMap:0xc00034699d MMlock:0xc000346ab1 LowVRAM:0xc000346ab1 Grammar: StopWords:[<|im_end|> api-1 | api-1 | api-1 | api-1 | ] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc0003469d0 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ api-1 | "model": "gpt-4", api-1 | "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] api-1 | }' api-1 | } api-1 | 9:25PM DBG templated message for chat: <|im_start|>user api-1 | How are you doing? api-1 | <|im_end|> api-1 | api-1 | 9:25PM DBG Prompt (before templating): <|im_start|>user api-1 | How are you doing? api-1 | <|im_end|> api-1 | api-1 | 9:25PM DBG Template found, input modified to: <|im_start|>user api-1 | How are you doing? api-1 | <|im_end|> api-1 | <|im_start|>assistant api-1 | api-1 | 9:25PM DBG Prompt (after templating): <|im_start|>user api-1 | How are you doing? api-1 | <|im_end|> api-1 | <|im_start|>assistant api-1 | api-1 | 9:25PM INF Trying to load the model '5c7cd056ecf9a4bb5b527410b97f48cb' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper, /build/backend/python/exllama/run.sh, /build/backend/python/transformers/run.sh, /build/backend/python/vall-e-x/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/coqui/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/bark/run.sh, /build/backend/python/mamba/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/vllm/run.sh api-1 | 9:25PM INF [llama-cpp] Attempting to load api-1 | 9:25PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend llama-cpp api-1 | 9:25PM DBG Loading model in memory from file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb api-1 | 9:25PM DBG Loading Model 5c7cd056ecf9a4bb5b527410b97f48cb with gRPC (file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb) (backend: llama-cpp): {backendString:llama-cpp model:5c7cd056ecf9a4bb5b527410b97f48cb threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0002c2200 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} api-1 | 9:25PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp api-1 | 9:25PM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:45957' api-1 | 9:25PM DBG GRPC Service state dir: /tmp/go-processmanager2082677939 api-1 | 9:25PM DBG GRPC Service Started api-1 | [127.0.0.1]:35002 200 - GET /readyz api-1 | 9:26PM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:45957: connect: connection refused\"" api-1 | 9:26PM DBG GRPC Service NOT ready

mudler / LocalAI

transport: Error while dialing: dial tcp 127.0.0.1:40825: connect: connection refused #2098

image: localai/localai:latest-aio-cpu