mudler / LocalAI

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
https://localai.io
MIT License
21.54k stars 1.65k forks source link

"SIGILL: illegal instruction" causing "error reading from server: EOF" #1453

Closed that1guy closed 6 months ago

that1guy commented 6 months ago

My CPU only supports AVX,but not AVX2 or AVX512. This is causing issues I can't seem to workaround even when rebuidling with proper CPU Flags. See my .env

## Set number of threads.
## Note: prefer the number of physical cores. Overbooking the CPU degrades performance notably.
THREADS=2

## Specify a different bind address (defaults to ":8080")
# ADDRESS=127.0.0.1:8080

## Define galleries.
## models will to install will be visible in `/models/available`
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]

## Default path for models
MODELS_PATH=/models

## Enable debug mode
DEBUG=true

## Disables COMPEL (Lets Stable Diffuser work, uncomment if you plan on using it)
# COMPEL=0

## Enable/Disable single backend (useful if only one GPU is available)
# SINGLE_ACTIVE_BACKEND=true

## Specify a build type. Available: cublas, openblas, clblas.
BUILD_TYPE=openblas

## Uncomment and set to true to enable rebuilding from source
REBUILD=true

## Removing CPU FLAGS
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF -DLLAMA_FMA=OFF -DLLAMA_F16C=OFF"

## Enable go tags, available: stablediffusion, tts
## stablediffusion: image generation with stablediffusion
## tts: enables text-to-speech with go-piper 
## (requires REBUILD=true)
#
#GO_TAGS=tts

## Path where to store generated images
# IMAGE_PATH=/tmp

## Specify a default upload limit in MB (whisper)
# UPLOAD_LIMIT

# HUGGINGFACEHUB_API_TOKEN=Token here

LocalAI version: Latest

Environment, CPU architecture, OS, and Version:

Describe the bug

To Reproduce Issue HTTP Request:

curl http://localhost:9000/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "lunademo",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9 
   }'

Receive HTTP 500 Error Response:

{"error":{"code":500,"message":"could not load model: rpc error: code = Unavailable desc = error reading from server: EOF","type":""}}

Expected behavior HTTP 200 response

Logs

See Attached File
[Boot & Error Logs.txt](https://github.com/mudler/LocalAI/files/13695032/Boot.Error.Logs.txt)

Additional context Using following model and configs:

image

Lunademo.yaml:

backend: llama
context_size: 4096
name: lunademo
parameters:
  model: luna-ai-llama2-uncensored.Q4_K_M.gguf
  temperature: 0.2
template:
  chat: luna-chat-message
threads: 10
justaCasualCoder commented 6 months ago

Same here... Compiling from source with latest commit seeing if it is fixed...

EDIT1: Rebuilding did not help :(. This issue seems to be related to #1447 , #288 , and #950. I think it is a issue in llama.cpp. I thought that commit 86a8df1c8b44c7e18aceae04cf9b912677c1bdb2 fixed it but is does not seem like it did... EDIT2: Logs if they are useful:

Logs ``` @@@@@ Skipping rebuild @@@@@ If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed: CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" see the documentation at: https://localai.io/basics/build/index.html Note: See also https://github.com/go-skynet/LocalAI/issues/288 @@@@@ CPU info: model name : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d CPU: AVX found OK CPU: no AVX2 found CPU: no AVX512 found @@@@@ 12:34AM INF Starting LocalAI using 4 threads, with models path: /models 12:34AM INF LocalAI version: v2.1.0-2-g86a8df1 (86a8df1c8b44c7e18aceae04cf9b912677c1bdb2) 12:34AM DBG Extracting backend assets files to /tmp/localai/backend_data ┌───────────────────────────────────────────────────┐ │ Fiber v2.50.0 │ │ http://127.0.0.1:8080 │ │ (bound on host 0.0.0.0 and port 8080) │ │ │ │ Handlers ............ 74 Processes ........... 1 │ │ Prefork ....... Disabled PID ................ 14 │ └───────────────────────────────────────────────────┘ 12:34AM DBG Request received: 12:34AM DBG `input`: &{PredictionOptions:{Model:gpt-3.5-turbo Language: N:0 TopP:0 TopK:0 Temperature:0.7 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Context:context.Background.WithCancel Cancel:0x4a99c0 File: ResponseFormat:{Type:} Size: Prompt:A long time ago in a galaxy far, far away Instruction: Input: Stop: Messages:[] Functions:[] FunctionCall: Stream:false Mode:0 Step:0 Grammar: JSONFunctionGrammarObject: Backend: ModelBaseName:} 12:34AM DBG Parameter Config: &{PredictionOptions:{Model:gpt-3.5-turbo Language: N:0 TopP:0.7 TopK:80 Temperature:0.7 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name: F16:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[A long time ago in a galaxy far, far away] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:700 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false} 12:34AM DBG Loading model 'gpt-3.5-turbo' greedly from all the available backends: llama-cpp, llama-ggml, llama, gpt4all, gptneox, bert-embeddings, falcon-ggml, gptj, gpt2, dolly, mpt, replit, starcoder, rwkv, whisper, stablediffusion, piper, /build/backend/python/vall-e-x/run.sh, /build/backend/python/transformers/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/exllama/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/bark/run.sh, /build/backend/python/vllm/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/sentencetransformers/run.sh 12:34AM DBG [llama-cpp] Attempting to load 12:34AM INF Loading model 'gpt-3.5-turbo' with backend llama-cpp 12:34AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:34AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: llama-cpp): {backendString:llama-cpp model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:34AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp 12:34AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:37163' 12:34AM DBG GRPC Service state dir: /tmp/go-processmanager839887945 12:34AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37163: connect: connection refused" 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:37163): stdout Server listening on 127.0.0.1:37163 12:34AM DBG GRPC Service Ready 12:34AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:37163): stderr error loading model: failed to open /models/gpt-3.5-turbo: No such file or directory 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:37163): stdout {"timestamp":1702859684,"level":"ERROR","function":"load_model","line":589,"message":"unable to load model","model":"/models/gpt-3.5-turbo"} 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:37163): stderr llama_load_model_from_file: failed to load model 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:37163): stderr llama_init_from_gpt_params: error: failed to load model '/models/gpt-3.5-turbo' 12:34AM DBG [llama-cpp] Fails: could not load model: rpc error: code = Canceled desc = 12:34AM DBG [llama-ggml] Attempting to load 12:34AM INF Loading model 'gpt-3.5-turbo' with backend llama-ggml 12:34AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:34AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: llama-ggml): {backendString:llama-ggml model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:34AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-ggml 12:34AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:42213' 12:34AM DBG GRPC Service state dir: /tmp/go-processmanager1884516395 12:34AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42213: connect: connection refused" 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr 2023/12/18 00:34:44 gRPC Server listening at 127.0.0.1:42213 12:34AM DBG GRPC Service Ready 12:34AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr SIGILL: illegal instruction 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr PC=0x83939c m=4 sigcode=2 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr signal arrived during cgo execution 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr instruction bytes: 0xc4 0xe3 0x7d 0x39 0x8c 0x24 0x18 0x3 0x0 0x0 0x1 0x66 0x89 0x84 0x24 0x0 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr goroutine 34 [syscall]: 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.cgocall(0x81ace0, 0xc000225640) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000225618 sp=0xc0002255e0 pc=0x41368b 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr github.com/go-skynet/go-llama%2ecpp._Cfunc_load_model(0x7f27c8000ca0, 0x2bc, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x200, ...) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr _cgo_gotypes.go:250 +0x4c fp=0xc000225640 sp=0xc000225618 pc=0x80fa0c 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr github.com/go-skynet/go-llama%2ecpp.New({0xc00025c000, 0x15}, {0xc000202100, 0x7, 0x8f7380?}) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /build/sources/go-llama-ggml/llama.go:28 +0x299 fp=0xc0002257c0 sp=0xc000225640 pc=0x8101b9 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr main.(*LLM).Load(0xc000036cd0, 0xc00021e000) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /build/backend/go/llm/llama-ggml/llama.go:73 +0x92e fp=0xc000225900 sp=0xc0002257c0 pc=0x8183ae 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr github.com/go-skynet/LocalAI/pkg/grpc.(*server).LoadModel(0xc000036da0, {0xc00021e000?, 0x4fe226?}, 0x0?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /build/pkg/grpc/server.go:50 +0xe6 fp=0xc0002259b0 sp=0xc000225900 pc=0x815f66 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr github.com/go-skynet/LocalAI/pkg/grpc/proto._Backend_LoadModel_Handler({0x9686a0?, 0xc000036da0}, {0xa4ee30, 0xc0002000f0}, 0xc000206080, 0x0) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /build/pkg/grpc/proto/backend_grpc.pb.go:264 +0x169 fp=0xc000225a08 sp=0xc0002259b0 pc=0x804f89 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr google.golang.org/grpc.(*Server).processUnaryRPC(0xc0001a21e0, {0xa4ee30, 0xc000292240}, {0xa52358, 0xc000102b60}, 0xc0002aa000, 0xc0001aacc0, 0xd56430, 0x0) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1343 +0xe03 fp=0xc000225df0 sp=0xc000225a08 pc=0x7edf03 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr google.golang.org/grpc.(*Server).handleStream(0xc0001a21e0, {0xa52358, 0xc000102b60}, 0xc0002aa000) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1737 +0xc4c fp=0xc000225f78 sp=0xc000225df0 pc=0x7f2e6c 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr google.golang.org/grpc.(*Server).serveStreams.func1.1() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:986 +0x86 fp=0xc000225fe0 sp=0xc000225f78 pc=0x7ebe06 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.goexit() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000225fe8 sp=0xc000225fe0 pc=0x4769e1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr created by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 25 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:997 +0x145 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr goroutine 1 [IO wait]: 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.gopark(0x4c2b10?, 0xc00019fb28?, 0x78?, 0xfb?, 0x4e2ebd?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00019fb08 sp=0xc00019fae8 pc=0x447e2e 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.netpollblock(0x474a52?, 0x412e26?, 0x0?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc00019fb40 sp=0xc00019fb08 pc=0x4408d7 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr internal/poll.runtime_pollWait(0x7f27d82feeb0, 0x72) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc00019fb60 sp=0xc00019fb40 pc=0x471905 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr internal/poll.(*pollDesc).wait(0xc0000ee680?, 0x4?, 0x0) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00019fb88 sp=0xc00019fb60 pc=0x4dbb27 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr internal/poll.(*pollDesc).waitRead(...) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr internal/poll.(*FD).Accept(0xc0000ee680) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc00019fc30 sp=0xc00019fb88 pc=0x4e100c 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr net.(*netFD).accept(0xc0000ee680) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/net/fd_unix.go:172 +0x29 fp=0xc00019fce8 sp=0xc00019fc30 pc=0x63c949 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr net.(*TCPListener).accept(0xc00007a4c0) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/net/tcpsock_posix.go:152 +0x1e fp=0xc00019fd10 sp=0xc00019fce8 pc=0x65391e 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr net.(*TCPListener).Accept(0xc00007a4c0) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/net/tcpsock.go:315 +0x30 fp=0xc00019fd40 sp=0xc00019fd10 pc=0x652ad0 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr google.golang.org/grpc.(*Server).Serve(0xc0001a21e0, {0xa4e440?, 0xc00007a4c0}) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:852 +0x462 fp=0xc00019fe80 sp=0xc00019fd40 pc=0x7eaa62 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr github.com/go-skynet/LocalAI/pkg/grpc.StartServer({0x7ffd183ddabc?, 0xc000024160?}, {0xa52a80?, 0xc000036cd0}) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /build/pkg/grpc/server.go:178 +0x17d fp=0xc00019ff10 sp=0xc00019fe80 pc=0x81795d 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr main.main() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /build/backend/go/llm/llama-ggml/main.go:16 +0x85 fp=0xc00019ff40 sp=0xc00019ff10 pc=0x81a405 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.main() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc00019ffe0 sp=0xc00019ff40 pc=0x4479db 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.goexit() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00019ffe8 sp=0xc00019ffe0 pc=0x4769e1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr goroutine 2 [force gc (idle)]: 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005afa8 sp=0xc00005af88 pc=0x447e2e 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.goparkunlock(...) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:404 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.forcegchelper() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc00005afe0 sp=0xc00005afa8 pc=0x447cb3 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.goexit() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005afe8 sp=0xc00005afe0 pc=0x4769e1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr created by runtime.init.6 in goroutine 1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:310 +0x1a 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr goroutine 3 [GC sweep wait]: 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005b778 sp=0xc00005b758 pc=0x447e2e 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.goparkunlock(...) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:404 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.bgsweep(0x0?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/mgcsweep.go:280 +0x94 fp=0xc00005b7c8 sp=0xc00005b778 pc=0x433d54 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.gcenable.func1() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc00005b7e0 sp=0xc00005b7c8 pc=0x428f05 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.goexit() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005b7e8 sp=0xc00005b7e0 pc=0x4769e1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr created by runtime.gcenable in goroutine 1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/mgc.go:200 +0x66 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr goroutine 4 [GC scavenge wait]: 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.gopark(0xc00007c000?, 0xa47618?, 0x1?, 0x0?, 0xc0000071e0?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005bf70 sp=0xc00005bf50 pc=0x447e2e 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.goparkunlock(...) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:404 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.(*scavengerState).park(0xd9f8e0) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00005bfa0 sp=0xc00005bf70 pc=0x431629 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.bgscavenge(0x0?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc00005bfc8 sp=0xc00005bfa0 pc=0x431bbc 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.gcenable.func2() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc00005bfe0 sp=0xc00005bfc8 pc=0x428ea5 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.goexit() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005bfe8 sp=0xc00005bfe0 pc=0x4769e1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr created by runtime.gcenable in goroutine 1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/mgc.go:201 +0xa5 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr goroutine 5 [finalizer wait]: 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.gopark(0x198?, 0x9925a0?, 0x1?, 0x8f?, 0x0?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005a620 sp=0xc00005a600 pc=0x447e2e 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.runfinq() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc00005a7e0 sp=0xc00005a620 pc=0x427f27 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.goexit() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005a7e8 sp=0xc00005a7e0 pc=0x4769e1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr created by runtime.createfing in goroutine 1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/mfinal.go:163 +0x3d 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr goroutine 23 [select]: 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.gopark(0xc0002a1f00?, 0x2?, 0x1e?, 0x0?, 0xc0002a1ed4?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0002a1d80 sp=0xc0002a1d60 pc=0x447e2e 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.selectgo(0xc0002a1f00, 0xc0002a1ed0, 0x7851d6?, 0x0, 0xc00031a000?, 0x1) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc0002a1ea0 sp=0xc0002a1d80 pc=0x457885 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc00011c690, 0x1) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:418 +0x113 fp=0xc0002a1f30 sp=0xc0002a1ea0 pc=0x764033 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc000284070) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:552 +0x86 fp=0xc0002a1f90 sp=0xc0002a1f30 pc=0x764746 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func2() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:336 +0xd5 fp=0xc0002a1fe0 sp=0xc0002a1f90 pc=0x77af95 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.goexit() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0002a1fe8 sp=0xc0002a1fe0 pc=0x4769e1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 22 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:333 +0x1acc 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr goroutine 24 [select]: 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.gopark(0xc000057770?, 0x4?, 0xe0?, 0xed?, 0xc0000576c0?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000057528 sp=0xc000057508 pc=0x447e2e 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.selectgo(0xc000057770, 0xc0000576b8, 0x0?, 0x0, 0x0?, 0x1) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000057648 sp=0xc000057528 pc=0x457885 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr google.golang.org/grpc/internal/transport.(*http2Server).keepalive(0xc000102b60) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:1152 +0x225 fp=0xc0000577c8 sp=0xc000057648 pc=0x782245 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func4() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:339 +0x25 fp=0xc0000577e0 sp=0xc0000577c8 pc=0x77ae85 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.goexit() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000577e8 sp=0xc0000577e0 pc=0x4769e1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 22 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:339 +0x1b0e 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr goroutine 25 [IO wait]: 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.gopark(0xdb7a40?, 0xb?, 0x0?, 0x0?, 0x6?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006aaa0 sp=0xc00006aa80 pc=0x447e2e 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.netpollblock(0x4c0d98?, 0x412e26?, 0x0?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc00006aad8 sp=0xc00006aaa0 pc=0x4408d7 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr internal/poll.runtime_pollWait(0x7f27d82fedb8, 0x72) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc00006aaf8 sp=0xc00006aad8 pc=0x471905 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr internal/poll.(*pollDesc).wait(0xc00013e380?, 0xc000312000?, 0x0) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00006ab20 sp=0xc00006aaf8 pc=0x4dbb27 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr internal/poll.(*pollDesc).waitRead(...) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr internal/poll.(*FD).Read(0xc00013e380, {0xc000312000, 0x8000, 0x8000}) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc00006abb8 sp=0xc00006ab20 pc=0x4dce1a 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr net.(*netFD).Read(0xc00013e380, {0xc000312000?, 0x1060100000000?, 0x8?}) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc00006ac00 sp=0xc00006abb8 pc=0x63a925 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr net.(*conn).Read(0xc000110058, {0xc000312000?, 0x0?, 0xc00006acd0?}) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/net/net.go:179 +0x45 fp=0xc00006ac48 sp=0xc00006ac00 pc=0x64b045 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr net.(*TCPConn).Read(0x0?, {0xc000312000?, 0xc00006aca0?, 0x465d2d?}) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr :1 +0x25 fp=0xc00006ac78 sp=0xc00006ac48 pc=0x65d7e5 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr bufio.(*Reader).Read(0xc0001125a0, {0xc000126120, 0x9, 0xc15802c98dc9373e?}) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/bufio/bufio.go:244 +0x197 fp=0xc00006acb0 sp=0xc00006ac78 pc=0x5b5eb7 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr io.ReadAtLeast({0xa4bea0, 0xc0001125a0}, {0xc000126120, 0x9, 0x9}, 0x9) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/io/io.go:335 +0x90 fp=0xc00006acf8 sp=0xc00006acb0 pc=0x4baf50 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr io.ReadFull(...) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/io/io.go:354 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr golang.org/x/net/http2.readFrameHeader({0xc000126120, 0x9, 0xc00002a5b8?}, {0xa4bea0?, 0xc0001125a0?}) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/golang.org/x/net@v0.17.0/http2/frame.go:237 +0x65 fp=0xc00006ad48 sp=0xc00006acf8 pc=0x750aa5 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr golang.org/x/net/http2.(*Framer).ReadFrame(0xc0001260e0) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/golang.org/x/net@v0.17.0/http2/frame.go:498 +0x85 fp=0xc00006adf0 sp=0xc00006ad48 pc=0x7511e5 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams(0xc000102b60, 0x1?) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:636 +0x145 fp=0xc00006af00 sp=0xc00006adf0 pc=0x77e0e5 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr google.golang.org/grpc.(*Server).serveStreams(0xc0001a21e0, {0xa52358?, 0xc000102b60}) 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:979 +0x1c2 fp=0xc00006af80 sp=0xc00006af00 pc=0x7ebba2 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr google.golang.org/grpc.(*Server).handleRawConn.func1() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:920 +0x45 fp=0xc00006afe0 sp=0xc00006af80 pc=0x7eb405 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr runtime.goexit() 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006afe8 sp=0xc00006afe0 pc=0x4769e1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr created by google.golang.org/grpc.(*Server).handleRawConn in goroutine 22 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:919 +0x185 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr rax 0x0 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr rbx 0xa6ed20 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr rcx 0x7f27d3ffe3c0 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr rdx 0x7f282043e6d8 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr rdi 0x7f282043e6c8 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr rsi 0x7f2820436e38 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr rbp 0x7f27d3ffe4e0 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr rsp 0x7f27d3ffe160 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr r8 0x0 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr r9 0x7f27c8000080 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr r10 0xfffffffffffff7cd 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr r11 0x7f2820341990 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr r12 0x1 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr r13 0x7f27d3ffe280 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr r14 0x7f27d3ffe210 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr r15 0x7f27d3ffe380 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr rip 0x83939c 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr rflags 0x10246 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr cs 0x33 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr fs 0x0 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:42213): stderr gs 0x0 12:34AM DBG [llama-ggml] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF 12:34AM DBG [llama] Attempting to load 12:34AM INF Loading model 'gpt-3.5-turbo' with backend llama 12:34AM DBG llama-cpp is an alias of llama-cpp 12:34AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:34AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: llama-cpp): {backendString:llama model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:34AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp 12:34AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:37677' 12:34AM DBG GRPC Service state dir: /tmp/go-processmanager1441916734 12:34AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37677: connect: connection refused" 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:37677): stdout Server listening on 127.0.0.1:37677 12:34AM DBG GRPC Service Ready 12:34AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:37677): stdout {"timestamp":1702859688,"level":"ERROR","function":"load_model","line":589,"message":"unable to load model","model":"/models/gpt-3.5-turbo"} 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:37677): stderr error loading model: failed to open /models/gpt-3.5-turbo: No such file or directory 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:37677): stderr llama_load_model_from_file: failed to load model 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:37677): stderr llama_init_from_gpt_params: error: failed to load model '/models/gpt-3.5-turbo' 12:34AM DBG [llama] Fails: could not load model: rpc error: code = Canceled desc = 12:34AM DBG [gpt4all] Attempting to load 12:34AM INF Loading model 'gpt-3.5-turbo' with backend gpt4all 12:34AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:34AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: gpt4all): {backendString:gpt4all model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:34AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/gpt4all 12:34AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:45537' 12:34AM DBG GRPC Service state dir: /tmp/go-processmanager3342732468 12:34AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45537: connect: connection refused" 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:45537): stderr 2023/12/18 00:34:48 gRPC Server listening at 127.0.0.1:45537 12:34AM DBG GRPC Service Ready 12:34AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:45537): stderr load_model: error 'No such file or directory' 12:34AM DBG [gpt4all] Fails: could not load model: rpc error: code = Unknown desc = failed loading model 12:34AM DBG [gptneox] Attempting to load 12:34AM INF Loading model 'gpt-3.5-turbo' with backend gptneox 12:34AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:34AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: gptneox): {backendString:gptneox model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:34AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/gptneox 12:34AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:43629' 12:34AM DBG GRPC Service state dir: /tmp/go-processmanager1272713850 12:34AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43629: connect: connection refused" 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:43629): stderr 2023/12/18 00:34:50 gRPC Server listening at 127.0.0.1:43629 12:34AM DBG GRPC Service Ready 12:34AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:43629): stderr gpt_neox_model_load: failed to open '/models/gpt-3.5-turbo' 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:43629): stderr gpt_neox_bootstrap: failed to load model from '/models/gpt-3.5-turbo' 12:34AM DBG [gptneox] Fails: could not load model: rpc error: code = Unknown desc = failed loading model 12:34AM DBG [bert-embeddings] Attempting to load 12:34AM INF Loading model 'gpt-3.5-turbo' with backend bert-embeddings 12:34AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:34AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: bert-embeddings): {backendString:bert-embeddings model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:34AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/bert-embeddings 12:34AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:35629' 12:34AM DBG GRPC Service state dir: /tmp/go-processmanager2322861504 12:34AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35629: connect: connection refused" 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:35629): stderr 2023/12/18 00:34:52 gRPC Server listening at 127.0.0.1:35629 12:34AM DBG GRPC Service Ready 12:34AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:35629): stderr bert_load_from_file: failed to open '/models/gpt-3.5-turbo' 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:35629): stderr bert_bootstrap: failed to load model from '/models/gpt-3.5-turbo' 12:34AM DBG [bert-embeddings] Fails: could not load model: rpc error: code = Unknown desc = failed loading model 12:34AM DBG [falcon-ggml] Attempting to load 12:34AM INF Loading model 'gpt-3.5-turbo' with backend falcon-ggml 12:34AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:34AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: falcon-ggml): {backendString:falcon-ggml model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:34AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/falcon-ggml 12:34AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:37111' 12:34AM DBG GRPC Service state dir: /tmp/go-processmanager820972952 12:34AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37111: connect: connection refused" 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:37111): stderr 2023/12/18 00:34:54 gRPC Server listening at 127.0.0.1:37111 12:34AM DBG GRPC Service Ready 12:34AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:37111): stderr falcon_model_load: failed to open '/models/gpt-3.5-turbo' 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:37111): stderr falcon_bootstrap: failed to load model from '/models/gpt-3.5-turbo' 12:34AM DBG [falcon-ggml] Fails: could not load model: rpc error: code = Unknown desc = failed loading model 12:34AM DBG [gptj] Attempting to load 12:34AM INF Loading model 'gpt-3.5-turbo' with backend gptj 12:34AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:34AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: gptj): {backendString:gptj model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:34AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/gptj 12:34AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:35033' 12:34AM DBG GRPC Service state dir: /tmp/go-processmanager3373203473 12:34AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35033: connect: connection refused" 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:35033): stderr 2023/12/18 00:34:56 gRPC Server listening at 127.0.0.1:35033 12:34AM DBG GRPC Service Ready 12:34AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:35033): stderr gptj_model_load: failed to open '/models/gpt-3.5-turbo' 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:35033): stderr gptj_bootstrap: failed to load model from '/models/gpt-3.5-turbo' 12:34AM DBG [gptj] Fails: could not load model: rpc error: code = Unknown desc = failed loading model 12:34AM DBG [gpt2] Attempting to load 12:34AM INF Loading model 'gpt-3.5-turbo' with backend gpt2 12:34AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:34AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: gpt2): {backendString:gpt2 model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:34AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/gpt2 12:34AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:46323' 12:34AM DBG GRPC Service state dir: /tmp/go-processmanager3360495548 12:34AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46323: connect: connection refused" 12:34AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:46323): stderr 2023/12/18 00:34:58 gRPC Server listening at 127.0.0.1:46323 12:35AM DBG GRPC Service Ready 12:35AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:46323): stderr gpt2_model_load: failed to open '/models/gpt-3.5-turbo' 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:46323): stderr gpt2_bootstrap: failed to load model from '/models/gpt-3.5-turbo' 12:35AM DBG [gpt2] Fails: could not load model: rpc error: code = Unknown desc = failed loading model 12:35AM DBG [dolly] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend dolly 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: dolly): {backendString:dolly model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/dolly 12:35AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:34941' 12:35AM DBG GRPC Service state dir: /tmp/go-processmanager1138556341 12:35AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34941: connect: connection refused" 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:34941): stderr 2023/12/18 00:35:00 gRPC Server listening at 127.0.0.1:34941 12:35AM DBG GRPC Service Ready 12:35AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:34941): stderr dollyv2_model_load: failed to open '/models/gpt-3.5-turbo' 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:34941): stderr dolly_bootstrap: failed to load model from '/models/gpt-3.5-turbo' 12:35AM DBG [dolly] Fails: could not load model: rpc error: code = Unknown desc = failed loading model 12:35AM DBG [mpt] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend mpt 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: mpt): {backendString:mpt model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/mpt 12:35AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:43203' 12:35AM DBG GRPC Service state dir: /tmp/go-processmanager3637118573 12:35AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43203: connect: connection refused" 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:43203): stderr 2023/12/18 00:35:02 gRPC Server listening at 127.0.0.1:43203 12:35AM DBG GRPC Service Ready 12:35AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:43203): stderr mpt_model_load: failed to open '/models/gpt-3.5-turbo' 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:43203): stderr mpt_bootstrap: failed to load model from '/models/gpt-3.5-turbo' 12:35AM DBG [mpt] Fails: could not load model: rpc error: code = Unknown desc = failed loading model 12:35AM DBG [replit] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend replit 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: replit): {backendString:replit model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/replit 12:35AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:44361' 12:35AM DBG GRPC Service state dir: /tmp/go-processmanager3399944012 12:35AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44361: connect: connection refused" 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:44361): stderr 2023/12/18 00:35:04 gRPC Server listening at 127.0.0.1:44361 12:35AM DBG GRPC Service Ready 12:35AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:44361): stderr replit_model_load: failed to open '/models/gpt-3.5-turbo' 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:44361): stderr replit_bootstrap: failed to load model from '/models/gpt-3.5-turbo' 12:35AM DBG [replit] Fails: could not load model: rpc error: code = Unknown desc = failed loading model 12:35AM DBG [starcoder] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend starcoder 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: starcoder): {backendString:starcoder model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/starcoder 12:35AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:38761' 12:35AM DBG GRPC Service state dir: /tmp/go-processmanager1676035297 12:35AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38761: connect: connection refused" 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:38761): stderr 2023/12/18 00:35:06 gRPC Server listening at 127.0.0.1:38761 12:35AM DBG GRPC Service Ready 12:35AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:38761): stderr starcoder_model_load: failed to open '/models/gpt-3.5-turbo' 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:38761): stderr starcoder_bootstrap: failed to load model from '/models/gpt-3.5-turbo' 12:35AM DBG [starcoder] Fails: could not load model: rpc error: code = Unknown desc = failed loading model 12:35AM DBG [rwkv] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend rwkv 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: rwkv): {backendString:rwkv model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/rwkv 12:35AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:41077' 12:35AM DBG GRPC Service state dir: /tmp/go-processmanager4196626345 12:35AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41077: connect: connection refused" 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr 2023/12/18 00:35:08 gRPC Server listening at 127.0.0.1:41077 12:35AM DBG GRPC Service Ready 12:35AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr Failed to open file /models/gpt-3.5-turbo 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr /build/sources/go-rwkv/rwkv.cpp/rwkv.cpp:1129: file.file 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr /build/sources/go-rwkv/rwkv.cpp/rwkv.cpp:1266: rwkv_instance_from_file(file_path, *instance.get()) 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr panic: runtime error: invalid memory address or nil pointer dereference 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x80778e] 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr goroutine 34 [running]: 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr github.com/donomii/go-rwkv%2ecpp.LoadFiles.(*Context).GetStateBufferElementCount.func1(0xc0002aa018?) 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr /build/sources/go-rwkv/wrapper.go:63 +0xe 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr github.com/donomii/go-rwkv%2ecpp.(*Context).GetStateBufferElementCount(...) 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr /build/sources/go-rwkv/wrapper.go:63 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr github.com/donomii/go-rwkv%2ecpp.LoadFiles({0xc0002aa018?, 0xc0002aa020?}, {0xc0002e6030, 0x24}, 0x7eab27?) 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr /build/sources/go-rwkv/wrapper.go:131 +0x52 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr main.(*LLM).Load(0xc000036cd0, 0xc0002be000) 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr /build/backend/go/llm/rwkv/rwkv.go:31 +0xee 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr github.com/go-skynet/LocalAI/pkg/grpc.(*server).LoadModel(0xc000036da0, {0xc0002be000?, 0x500a06?}, 0x0?) 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr /build/pkg/grpc/server.go:50 +0xe6 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr github.com/go-skynet/LocalAI/pkg/grpc/proto._Backend_LoadModel_Handler({0x9157e0?, 0xc000036da0}, {0x9fbcd0, 0xc0002802d0}, 0xc000286080, 0x0) 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr /build/pkg/grpc/proto/backend_grpc.pb.go:264 +0x169 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr google.golang.org/grpc.(*Server).processUnaryRPC(0xc0001a21e0, {0x9fbcd0, 0xc000280210}, {0x9fee38, 0xc00018d380}, 0xc0002a8000, 0xc0001acc90, 0xcf10b0, 0x0) 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1343 +0xe03 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr google.golang.org/grpc.(*Server).handleStream(0xc0001a21e0, {0x9fee38, 0xc00018d380}, 0xc0002a8000) 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1737 +0xc4c 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr google.golang.org/grpc.(*Server).serveStreams.func1.1() 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:986 +0x86 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr created by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 13 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:41077): stderr /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:997 +0x145 12:35AM DBG [rwkv] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF 12:35AM DBG [whisper] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend whisper 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: whisper): {backendString:whisper model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/whisper 12:35AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:39981' 12:35AM DBG GRPC Service state dir: /tmp/go-processmanager3293508653 12:35AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39981: connect: connection refused" 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:39981): stderr 2023/12/18 00:35:10 gRPC Server listening at 127.0.0.1:39981 12:35AM DBG GRPC Service Ready 12:35AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:35AM DBG [whisper] Fails: could not load model: rpc error: code = Unknown desc = stat /models/gpt-3.5-turbo: no such file or directory 12:35AM DBG [stablediffusion] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend stablediffusion 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: stablediffusion): {backendString:stablediffusion model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/stablediffusion 12:35AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:36501' 12:35AM DBG GRPC Service state dir: /tmp/go-processmanager1865701971 12:35AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:36501: connect: connection refused" 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:36501): stderr 2023/12/18 00:35:12 gRPC Server listening at 127.0.0.1:36501 12:35AM DBG GRPC Service Ready 12:35AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:35AM DBG [stablediffusion] Fails: could not load model: rpc error: code = Unknown desc = stat /models/gpt-3.5-turbo: no such file or directory 12:35AM DBG [piper] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend piper 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: piper): {backendString:piper model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/piper 12:35AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:34307' 12:35AM DBG GRPC Service state dir: /tmp/go-processmanager1826717209 12:35AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34307: connect: connection refused" 12:35AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:34307): stderr 2023/12/18 00:35:14 gRPC Server listening at 127.0.0.1:34307 12:35AM DBG GRPC Service Ready 12:35AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:700 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/espeak-ng-data RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} 12:35AM DBG [piper] Fails: could not load model: rpc error: code = Unknown desc = unsupported model type /models/gpt-3.5-turbo (should end with .onnx) 12:35AM DBG [/build/backend/python/vall-e-x/run.sh] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend /build/backend/python/vall-e-x/run.sh 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: /build/backend/python/vall-e-x/run.sh): {backendString:/build/backend/python/vall-e-x/run.sh model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG [/build/backend/python/vall-e-x/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vall-e-x/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 12:35AM DBG [/build/backend/python/transformers/run.sh] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend /build/backend/python/transformers/run.sh 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: /build/backend/python/transformers/run.sh): {backendString:/build/backend/python/transformers/run.sh model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG [/build/backend/python/transformers/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 12:35AM DBG [/build/backend/python/autogptq/run.sh] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend /build/backend/python/autogptq/run.sh 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: /build/backend/python/autogptq/run.sh): {backendString:/build/backend/python/autogptq/run.sh model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG [/build/backend/python/autogptq/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/autogptq/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 12:35AM DBG [/build/backend/python/exllama/run.sh] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend /build/backend/python/exllama/run.sh 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: /build/backend/python/exllama/run.sh): {backendString:/build/backend/python/exllama/run.sh model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG [/build/backend/python/exllama/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 12:35AM DBG [/build/backend/python/diffusers/run.sh] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend /build/backend/python/diffusers/run.sh 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: /build/backend/python/diffusers/run.sh): {backendString:/build/backend/python/diffusers/run.sh model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG [/build/backend/python/diffusers/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/diffusers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 12:35AM DBG [/build/backend/python/transformers-musicgen/run.sh] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend /build/backend/python/transformers-musicgen/run.sh 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: /build/backend/python/transformers-musicgen/run.sh): {backendString:/build/backend/python/transformers-musicgen/run.sh model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG [/build/backend/python/transformers-musicgen/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers-musicgen/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 12:35AM DBG [/build/backend/python/petals/run.sh] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend /build/backend/python/petals/run.sh 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: /build/backend/python/petals/run.sh): {backendString:/build/backend/python/petals/run.sh model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG [/build/backend/python/petals/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/petals/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 12:35AM DBG [/build/backend/python/bark/run.sh] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend /build/backend/python/bark/run.sh 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: /build/backend/python/bark/run.sh): {backendString:/build/backend/python/bark/run.sh model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG [/build/backend/python/bark/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/bark/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 12:35AM DBG [/build/backend/python/vllm/run.sh] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend /build/backend/python/vllm/run.sh 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: /build/backend/python/vllm/run.sh): {backendString:/build/backend/python/vllm/run.sh model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG [/build/backend/python/vllm/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vllm/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 12:35AM DBG [/build/backend/python/exllama2/run.sh] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend /build/backend/python/exllama2/run.sh 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: /build/backend/python/exllama2/run.sh): {backendString:/build/backend/python/exllama2/run.sh model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG [/build/backend/python/exllama2/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama2/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 12:35AM DBG [/build/backend/python/sentencetransformers/run.sh] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend /build/backend/python/sentencetransformers/run.sh 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: /build/backend/python/sentencetransformers/run.sh): {backendString:/build/backend/python/sentencetransformers/run.sh model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG [/build/backend/python/sentencetransformers/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 12:35AM DBG [/build/backend/python/sentencetransformers/run.sh] Attempting to load 12:35AM INF Loading model 'gpt-3.5-turbo' with backend /build/backend/python/sentencetransformers/run.sh 12:35AM DBG Loading model in memory from file: /models/gpt-3.5-turbo 12:35AM DBG Loading Model gpt-3.5-turbo with gRPC (file: /models/gpt-3.5-turbo) (backend: /build/backend/python/sentencetransformers/run.sh): {backendString:/build/backend/python/sentencetransformers/run.sh model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00021a5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 12:35AM DBG [/build/backend/python/sentencetransformers/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS [127.0.0.1]:60686 200 - GET /readyz ```
that1guy commented 6 months ago

Glad to hear I'm not alone. Let me know if you have a breakthrough. :)

justaCasualCoder commented 6 months ago

Great News @that1guy ! Got it working! Building from source with this modified Dockerfile worked! Here are the steps I used:

that1guy commented 6 months ago

@justaCasualCoder thanks for digging in and sharing your findings! When I manually ran a diff between the Dockerfile you found and the one in the main repo I noticed all your Dockerfile was doing was adding the same CPU flags.

Ultimately, I just pulled down the new 2.0.1 Dockerfile and everything worked. I guess I was just experience an edge case scenario related to 2.0.0.