Closed that1guy closed 10 months ago
FYI - I'm running this container on top of Unraid.
@that1guy Thanks for the report!
SIGILL: illegal instruction
This is due to required CPU instruction set not present, do you have minimum AVX supported? You can run grep -e "flags" /proc/cpuinfo | head -1
to get this info and see if avx
is present in the flags.
I'm getting the same error, docker 24.0.7 on top of Ubuntu 20.04.
My CPU flags (highlighted AVX with quotes):
$ grep -e "flags" /proc/cpuinfo | head -1
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx
fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm
pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave "avx" f16c rdrand lahf_lm cpuid_fault epb pti ssbd
ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts
md_clear flush_l1d
Request:
$ curl http://mydomain/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{ "model": "llama-2-7b-chat", "messages": [{"role": "user", "content": "explain kubernetes in a sentence"}] }'
Response:
{"error":{"code":500,"message":"could not load model: rpc error: code = Unavailable desc = error reading from server: EOF","type":""}}
Complete Logs:
11:50AM DBG no galleries to load
11:50AM INF Starting LocalAI using 4 threads, with models path: /models
11:50AM INF LocalAI version: v2.0.0 (238fec244ae6c9a66bc7fafd76c7e14671110a6f)
11:50AM DBG Model: llama-2-7b-chat (config: {PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}})
11:50AM DBG Extracting backend assets files to /tmp/localai/backend_data
┌───────────────────────────────────────────────────┐
│ Fiber v2.50.0 │
│ http://127.0.0.1:8080 │
│ (bound on host 0.0.0.0 and port 8080) │
│ │
│ Handlers ............ 74 Processes ........... 1 │
│ Prefork ....... Disabled PID ................. 1 │
└───────────────────────────────────────────────────┘
11:52AM DBG Request received:
11:52AM DBG Configuration read: &{PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
11:52AM DBG Parameters: &{PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
11:52AM DBG Prompt (before templating): explain kubernetes in a sentence
11:52AM DBG Template failed loading: failed loading a template for llama-2-7b-chat.Q4_K_M.gguf
11:52AM DBG Prompt (after templating): explain kubernetes in a sentence
11:52AM DBG Loading model llama from llama-2-7b-chat.Q4_K_M.gguf
11:52AM DBG Loading model in memory from file: /models/llama-2-7b-chat.Q4_K_M.gguf
11:52AM DBG Loading Model llama-2-7b-chat.Q4_K_M.gguf with gRPC (file: /models/llama-2-7b-chat.Q4_K_M.gguf) (backend: llama): {backendString:llama model:llama-2-7b-chat.Q4_K_M.gguf threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0002aa5a0 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
11:52AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama
11:52AM DBG GRPC Service for llama-2-7b-chat.Q4_K_M.gguf will be running at: '127.0.0.1:45427'
11:52AM DBG GRPC Service state dir: /tmp/go-processmanager1396330951
11:52AM DBG GRPC Service Started
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45427: connect: connection refused"
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr 2023/12/15 11:52:34 gRPC Server listening at 127.0.0.1:45427
11:52AM DBG GRPC Service Ready
11:52AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:llama-2-7b-chat.Q4_K_M.gguf ContextSize:4096 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/llama-2-7b-chat.Q4_K_M.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr create_gpt_params: loading model /models/llama-2-7b-chat.Q4_K_M.gguf
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr SIGILL: illegal instruction
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr PC=0x86854d m=0 sigcode=2
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr signal arrived during cgo execution
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr instruction bytes: 0xc4 0xe2 0x71 0xa9 0x15 0xaa 0x79 0x23 0x0 0xc5 0xfa 0x11 0x4c 0x24 0x10 0xc5
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr goroutine 34 [syscall]:
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.cgocall(0x821ae0, 0xc0001554d8)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/cgocall.go:157 +0x4b fp=0xc0001554b0 sp=0xc000155478 pc=0x4176eb
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr github.com/go-skynet/go-llama%2ecpp._Cfunc_load_model(0x1501460, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x200, ...)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr _cgo_gotypes.go:266 +0x4f fp=0xc0001554d8 sp=0xc0001554b0 pc=0x8143af
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr github.com/go-skynet/go-llama%2ecpp.New({0xc000118000, 0x23}, {0xc000110240, 0x7, 0x926460?})
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/work/LocalAI/LocalAI/sources/go-llama/llama.go:39 +0x385 fp=0xc0001556e8 sp=0xc0001554d8 pc=0x814da5
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr main.(*LLM).Load(0xc000012630, 0xc00014e000)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/work/LocalAI/LocalAI/backend/go/llm/llama/llama.go:87 +0xc9c fp=0xc000155900 sp=0xc0001556e8 pc=0x81ed1c
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr github.com/go-skynet/LocalAI/pkg/grpc.(*server).LoadModel(0xc000030d90, {0xc00014e000?, 0x50a886?}, 0x0?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/work/LocalAI/LocalAI/pkg/grpc/server.go:50 +0xe6 fp=0xc0001559b0 sp=0xc000155900 pc=0x81c566
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr github.com/go-skynet/LocalAI/pkg/grpc/proto._Backend_LoadModel_Handler({0x997880?, 0xc000030d90}, {0xa7e610, 0xc00010e390}, 0xc000114100, 0x0)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/work/LocalAI/LocalAI/pkg/grpc/proto/backend_grpc.pb.go:264 +0x169 fp=0xc000155a08 sp=0xc0001559b0 pc=0x809829
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr google.golang.org/grpc.(*Server).processUnaryRPC(0xc0001a61e0, {0xa7e610, 0xc00010e2d0}, {0xa81b38, 0xc0001a11e0}, 0xc00013e000, 0xc0001aecc0, 0xd924b0, 0x0)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1343 +0xe03 fp=0xc000155df0 sp=0xc000155a08 pc=0x7f27c3
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr google.golang.org/grpc.(*Server).handleStream(0xc0001a61e0, {0xa81b38, 0xc0001a11e0}, 0xc00013e000)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1737 +0xc4c fp=0xc000155f78 sp=0xc000155df0 pc=0x7f772c
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr google.golang.org/grpc.(*Server).serveStreams.func1.1()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:986 +0x86 fp=0xc000155fe0 sp=0xc000155f78 pc=0x7f06c6
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.goexit()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000155fe8 sp=0xc000155fe0 pc=0x47aa01
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr created by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 13
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:997 +0x145
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr goroutine 1 [IO wait]:
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.gopark(0x4c6b50?, 0xc000197b28?, 0x78?, 0x7b?, 0x4e6edd?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc000197b08 sp=0xc000197ae8 pc=0x44be4e
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.netpollblock(0x478a72?, 0x416e86?, 0x0?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/netpoll.go:564 +0xf7 fp=0xc000197b40 sp=0xc000197b08 pc=0x4448d7
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr internal/poll.runtime_pollWait(0x7f69fc110eb0, 0x72)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/netpoll.go:343 +0x85 fp=0xc000197b60 sp=0xc000197b40 pc=0x475925
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr internal/poll.(*pollDesc).wait(0xc0000ec680?, 0x4?, 0x0)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000197b88 sp=0xc000197b60 pc=0x4dfb47
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr internal/poll.(*pollDesc).waitRead(...)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_poll_runtime.go:89
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr internal/poll.(*FD).Accept(0xc0000ec680)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc000197c30 sp=0xc000197b88 pc=0x4e502c
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr net.(*netFD).accept(0xc0000ec680)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/net/fd_unix.go:172 +0x29 fp=0xc000197ce8 sp=0xc000197c30 pc=0x640b09
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr net.(*TCPListener).accept(0xc0000744c0)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/net/tcpsock_posix.go:152 +0x1e fp=0xc000197d10 sp=0xc000197ce8 pc=0x657abe
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr net.(*TCPListener).Accept(0xc0000744c0)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/net/tcpsock.go:315 +0x30 fp=0xc000197d40 sp=0xc000197d10 pc=0x656c70
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr google.golang.org/grpc.(*Server).Serve(0xc0001a61e0, {0xa7dc20?, 0xc0000744c0})
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:852 +0x462 fp=0xc000197e80 sp=0xc000197d40 pc=0x7ef322
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr github.com/go-skynet/LocalAI/pkg/grpc.StartServer({0x7fff1a554de9?, 0xc000024160?}, {0xa82260?, 0xc000012630})
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/work/LocalAI/LocalAI/pkg/grpc/server.go:178 +0x17d fp=0xc000197f10 sp=0xc000197e80 pc=0x81df5d
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr main.main()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/work/LocalAI/LocalAI/backend/go/llm/llama/main.go:20 +0x85 fp=0xc000197f40 sp=0xc000197f10 pc=0x8212c5
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.main()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:267 +0x2bb fp=0xc000197fe0 sp=0xc000197f40 pc=0x44b9fb
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.goexit()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000197fe8 sp=0xc000197fe0 pc=0x47aa01
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr goroutine 2 [force gc (idle)]:
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc000054fa8 sp=0xc000054f88 pc=0x44be4e
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.goparkunlock(...)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:404
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.forcegchelper()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:322 +0xb3 fp=0xc000054fe0 sp=0xc000054fa8 pc=0x44bcd3
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.goexit()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000054fe8 sp=0xc000054fe0 pc=0x47aa01
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr created by runtime.init.6 in goroutine 1
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:310 +0x1a
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr goroutine 3 [GC sweep wait]:
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc000055778 sp=0xc000055758 pc=0x44be4e
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.goparkunlock(...)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:404
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.bgsweep(0x0?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgcsweep.go:280 +0x94 fp=0xc0000557c8 sp=0xc000055778 pc=0x437d54
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.gcenable.func1()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgc.go:200 +0x25 fp=0xc0000557e0 sp=0xc0000557c8 pc=0x42cf25
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.goexit()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000557e8 sp=0xc0000557e0 pc=0x47aa01
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr created by runtime.gcenable in goroutine 1
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgc.go:200 +0x66
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr goroutine 4 [GC scavenge wait]:
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.gopark(0xc00007e000?, 0xa76dc8?, 0x1?, 0x0?, 0xc0000071e0?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc000055f70 sp=0xc000055f50 pc=0x44be4e
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.goparkunlock(...)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:404
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.(*scavengerState).park(0xddb960)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000055fa0 sp=0xc000055f70 pc=0x435629
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.bgscavenge(0x0?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc000055fc8 sp=0xc000055fa0 pc=0x435bbc
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.gcenable.func2()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgc.go:201 +0x25 fp=0xc000055fe0 sp=0xc000055fc8 pc=0x42cec5
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.goexit()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000055fe8 sp=0xc000055fe0 pc=0x47aa01
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr created by runtime.gcenable in goroutine 1
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgc.go:201 +0xa5
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr goroutine 5 [finalizer wait]:
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.gopark(0x9c1d00?, 0x10044cf01?, 0x0?, 0x0?, 0x454005?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc000054628 sp=0xc000054608 pc=0x44be4e
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.runfinq()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mfinal.go:193 +0x107 fp=0xc0000547e0 sp=0xc000054628 pc=0x42bfa7
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.goexit()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000547e8 sp=0xc0000547e0 pc=0x47aa01
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr created by runtime.createfing in goroutine 1
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mfinal.go:163 +0x3d
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr goroutine 11 [select]:
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.gopark(0xc000129f00?, 0x2?, 0x0?, 0x0?, 0xc000129ecc?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc000129d78 sp=0xc000129d58 pc=0x44be4e
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.selectgo(0xc000129f00, 0xc000129ec8, 0xc000129ee8?, 0x0, 0x95f7a0?, 0x1)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/select.go:327 +0x725 fp=0xc000129e98 sp=0xc000129d78 pc=0x45b8a5
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc0000885f0, 0x1)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:418 +0x113 fp=0xc000129f30 sp=0xc000129e98 pc=0x768893
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc000116070)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:552 +0x86 fp=0xc000129f90 sp=0xc000129f30 pc=0x768fc6
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func2()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:336 +0xd5 fp=0xc000129fe0 sp=0xc000129f90 pc=0x77f815
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.goexit()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000129fe8 sp=0xc000129fe0 pc=0x47aa01
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 10
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:333 +0x1acc
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr goroutine 12 [select]:
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.gopark(0xc000057f70?, 0x4?, 0x0?, 0xc1?, 0xc000057ec0?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc000057d28 sp=0xc000057d08 pc=0x44be4e
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.selectgo(0xc000057f70, 0xc000057eb8, 0x0?, 0x0, 0x0?, 0x1)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/select.go:327 +0x725 fp=0xc000057e48 sp=0xc000057d28 pc=0x45b8a5
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr google.golang.org/grpc/internal/transport.(*http2Server).keepalive(0xc0001a11e0)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:1152 +0x225 fp=0xc000057fc8 sp=0xc000057e48 pc=0x786ac5
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func4()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:339 +0x25 fp=0xc000057fe0 sp=0xc000057fc8 pc=0x77f705
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.goexit()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000057fe8 sp=0xc000057fe0 pc=0x47aa01
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 10
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:339 +0x1b0e
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr goroutine 13 [IO wait]:
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.gopark(0x100000000?, 0xb?, 0x0?, 0x0?, 0x6?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc00006aaa0 sp=0xc00006aa80 pc=0x44be4e
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.netpollblock(0x4c4dd8?, 0x416e86?, 0x0?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/netpoll.go:564 +0xf7 fp=0xc00006aad8 sp=0xc00006aaa0 pc=0x4448d7
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr internal/poll.runtime_pollWait(0x7f69fc110db8, 0x72)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/netpoll.go:343 +0x85 fp=0xc00006aaf8 sp=0xc00006aad8 pc=0x475925
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr internal/poll.(*pollDesc).wait(0xc0000ec800?, 0xc0001dc000?, 0x0)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00006ab20 sp=0xc00006aaf8 pc=0x4dfb47
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr internal/poll.(*pollDesc).waitRead(...)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_poll_runtime.go:89
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr internal/poll.(*FD).Read(0xc0000ec800, {0xc0001dc000, 0x8000, 0x8000})
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc00006abb8 sp=0xc00006ab20 pc=0x4e0e3a
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr net.(*netFD).Read(0xc0000ec800, {0xc0001dc000?, 0x1060100000000?, 0x8?})
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/net/fd_posix.go:55 +0x25 fp=0xc00006ac00 sp=0xc00006abb8 pc=0x63eae5
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr net.(*conn).Read(0xc000058310, {0xc0001dc000?, 0xc00006ac90?, 0x3?})
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/net/net.go:179 +0x45 fp=0xc00006ac48 sp=0xc00006ac00 pc=0x64f1e5
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr net.(*TCPConn).Read(0x0?, {0xc0001dc000?, 0xc00006aca0?, 0x469d2d?})
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr <autogenerated>:1 +0x25 fp=0xc00006ac78 sp=0xc00006ac48 pc=0x661985
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr bufio.(*Reader).Read(0xc000026ba0, {0xc0001da120, 0x9, 0xc1572d6105517ab1?})
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/bufio/bufio.go:244 +0x197 fp=0xc00006acb0 sp=0xc00006ac78 pc=0x5b9f97
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr io.ReadAtLeast({0xa7b640, 0xc000026ba0}, {0xc0001da120, 0x9, 0x9}, 0x9)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/io/io.go:335 +0x90 fp=0xc00006acf8 sp=0xc00006acb0 pc=0x4befd0
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr io.ReadFull(...)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/io/io.go:354
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr golang.org/x/net/http2.readFrameHeader({0xc0001da120, 0x9, 0xc000218000?}, {0xa7b640?, 0xc000026ba0?})
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/golang.org/x/net@v0.17.0/http2/frame.go:237 +0x65 fp=0xc00006ad48 sp=0xc00006acf8 pc=0x755305
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr golang.org/x/net/http2.(*Framer).ReadFrame(0xc0001da0e0)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/golang.org/x/net@v0.17.0/http2/frame.go:498 +0x85 fp=0xc00006adf0 sp=0xc00006ad48 pc=0x755a45
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams(0xc0001a11e0, 0x1?)
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:636 +0x145 fp=0xc00006af00 sp=0xc00006adf0 pc=0x782965
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr google.golang.org/grpc.(*Server).serveStreams(0xc0001a61e0, {0xa81b38?, 0xc0001a11e0})
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:979 +0x1c2 fp=0xc00006af80 sp=0xc00006af00 pc=0x7f0462
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr google.golang.org/grpc.(*Server).handleRawConn.func1()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:920 +0x45 fp=0xc00006afe0 sp=0xc00006af80 pc=0x7efcc5
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr runtime.goexit()
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006afe8 sp=0xc00006afe0 pc=0x47aa01
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr created by google.golang.org/grpc.(*Server).handleRawConn in goroutine 10
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr /home/runner/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:919 +0x185
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr rax 0x0
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr rbx 0xe4ffc0
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr rcx 0x18
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr rdx 0x1966912
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr rdi 0x1
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr rsi 0x7fff1a552a30
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr rbp 0xe6ffc0
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr rsp 0x7fff1a552a10
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr r8 0x7fff1a5ec080
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr r9 0xe7d7
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr r10 0x7fff1a5ec090
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr r11 0x1966912
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr r12 0xe8ffc0
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr r13 0xeaffc0
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr r14 0xe0ffc0
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr r15 0x0
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr rip 0x86854d
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr rflags 0x10202
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr cs 0x33
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr fs 0x0
11:52AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45427): stderr gs 0x0
[192.168.136.2]:57080 500 - POST /v1/chat/completions
If need more information I'm glad to help
@victor-rds thanks for the report! can you check whether same output is there inside a container? something like docker run ubuntu:22.04 grep -e "flags" /proc/cpuinfo | head -1
another thing you can check whether same issue exists by using localai binary directly through https://github.com/mudler/LocalAI/releases/tag/v2.0.0
The container shows avx:
docker run --rm -it ubuntu:22.04 /bin/grep -e "flags" /proc/cpuinfo | head -1
Unable to find image 'ubuntu:22.04' locally
22.04: Pulling from library/ubuntu
5e8117c0bd28: Already exists
Digest: sha256:8eab65df33a6de2844c9aefd19efe8ddb87b7df5e9185a4ab73af936225685bb
Status: Downloaded newer image for ubuntu:22.04
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx
fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid
sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave "avx" f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb
stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
About the second request, the binaries are compiled with a version of GLIBC that's incompatible with my host, running a old version of ubuntu, I didn't had time to to fix I will try and send the results.
The flag are identical on my server and within Docker and the avx flag is present.
Running grep -e "flags" /proc/cpuinfo | head -1
in Unraid console gives me:
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Docker command docker run ubuntu:22.04 grep -e "flags" /proc/cpuinfo | head -1
gives me:
Unable to find image 'ubuntu:22.04' locally
22.04: Pulling from library/ubuntu
5e8117c0bd28: Pulling fs layer
5e8117c0bd28: Verifying Checksum
5e8117c0bd28: Download complete
5e8117c0bd28: Pull complete
Digest: sha256:8eab65df33a6de2844c9aefd19efe8ddb87b7df5e9185a4ab73af936225685bb
Status: Downloaded newer image for ubuntu:22.04
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
FYI - Opened ticket in LocalAI repo after fiddling with CPU Flags and rebuilds. Still no luck. https://github.com/mudler/LocalAI/issues/1453
Since you were able to repro with local-ai locally, this is not an aikit specific issue. I'll close this for now. If this is addressed in future versions of localai, aikit will automatically get this update.
Expected Behavior
First output response should be:
{"created":1701236489,"object":"chat.completion","id":"dd1ff40b-31a7-4418-9e32-42151ab6875a","model":"llama-2-7b-chat","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"\nKubernetes is a container orchestration system that automates the deployment, scaling, and management of containerized applications in a microservices architecture."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
Actual Behavior
Response received is:
{"error":{"code":500,"message":"could not load model: rpc error: code = Unavailable desc = error reading from server: EOF","type":""}}
Steps To Reproduce
docker run -d --rm -p 9000:8080 ghcr.io/sozercan/llama2:7b
(Port 9000 because 8080 is already in use)curl http://localhost:9000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "llama-2-7b-chat", "messages": [{"role": "user", "content": "explain kubernetes in a sentence"}] }'
{"error":{"code":500,"message":"could not load model: rpc error: code = Unavailable desc = error reading from server: EOF","type":""}}
===============
Logs upon container boot up 👍
Logs upon incoming HTTP request:
Are you willing to submit PRs to contribute to this bug fix?