mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
22.73k stars 1.73k forks source link

Local AI using CPU instead of GPU - CUDA #1108

Open noblerboy2004 opened 11 months ago

noblerboy2004 commented 11 months ago

LocalAI version:

1.30.0 Latest.

Environment, CPU architecture, OS, and Version:

Window server 2022. Xeon E5 2670v2. GPU Geforece GTX 1070 Describe the bug

LocalAI using CPU instead of GPU. CUDA remains 0% When calling chat completeion.

To Reproduce

Expected behavior

Logs

Additional context

Configure in env:

image

Docker Compose: version: '3.6'

services: api: deploy: resources: reservations: devices:

image

yhyu13 commented 11 months ago

I am new to this project, too. It looks like you need to set up gpu_layer in the config somewhere, but I don't know how.

noblerboy2004 commented 11 months ago

I am new to this project, too. It looks like you need to set up gpu_layer in the config somewhere, but I don't know how.

I also tried to search over internet. however, I could not find out solution :) Hope any experts could read this thread.

localai-bot commented 11 months ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

I see that you are using the LocalAI container image from quay.io, which should have CUDA support enabled by default. However, it seems like the GPU is not being used for some reason. To troubleshoot this issue, I would recommend checking the following:

  1. Ensure that the GPU is properly connected and detected by the system. You can use a tool like NVIDIA X Server to check this.
  2. Check that the CUDA driver is installed and up-to-date. You can use the command nvidia-cuda-toolkit to install the driver if needed.
  3. Ensure that the environment variables are properly set when running LocalAI. Make sure that CUDA_DEVICE_POOL_GPU_OVERRIDE is set to 1 and CUDA_VISIBLE_DEVICES is set to 0-1. You can check this by looking at the logs when starting LocalAI in debug mode (--debug or DEBUG=true).
  4. Check that the model you are using is compatible with CUDA and has been compiled with the correct settings. You can use the command localai models list to see a list of available models and their compatibility information.
  5. Ensure that the container image is up-to-date and has the latest CUDA drivers installed. You can try updating the image if necessary. I hope this helps! Let me know if you have any further questions or need more assistance.

Sources:

noblerboy2004 commented 11 months ago

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

I see that you are using the LocalAI container image from quay.io, which should have CUDA support enabled by default. However, it seems like the GPU is not being used for some reason. To troubleshoot this issue, I would recommend checking the following:

  1. Ensure that the GPU is properly connected and detected by the system. You can use a tool like NVIDIA X Server to check this.
  2. Check that the CUDA driver is installed and up-to-date. You can use the command nvidia-cuda-toolkit to install the driver if needed.
  3. Ensure that the environment variables are properly set when running LocalAI. Make sure that CUDA_DEVICE_POOL_GPU_OVERRIDE is set to 1 and CUDA_VISIBLE_DEVICES is set to 0-1. You can check this by looking at the logs when starting LocalAI in debug mode (--debug or DEBUG=true).
  4. Check that the model you are using is compatible with CUDA and has been compiled with the correct settings. You can use the command localai models list to see a list of available models and their compatibility information.
  5. Ensure that the container image is up-to-date and has the latest CUDA drivers installed. You can try updating the image if necessary. I hope this helps! Let me know if you have any further questions or need more assistance.

Sources:

Hi localai-bot,

Thank you for your support. I tried above guides but still failed.

  1. Ensure that the GPU is properly connected and detected by the system. You can use a tool like NVIDIA X Server to check this image

  2. Check that the CUDA driver is installed and up-to-date. Same above.

  3. Ensure that the environment variables are properly set image

  4. Check that the model you are using is compatible with CUDA. I did downloaded and tried three models below. But still failed. image

  5. I pull latest LocalAI but still not working with CUDA.

Please help to process further. Thank you very much.

lunamidori5 commented 11 months ago

@noblerboy2004 please post your models yaml file for better review

lunamidori5 commented 11 months ago

I am new to this project, too. It looks like you need to set up gpu_layer in the config somewhere, but I don't know how.

https://localai.io/howtos/easy-model-import-downloaded/

noblerboy2004 commented 11 months ago

@noblerboy2004 please post your models yaml file for better review

Hi Lunamidori5,

Thank you for your action.

Here is folder of downloaded models:

image

gpt4all-j-groovy working ok with CPU. No GPU usage -CUDO 0% backend: gpt4all-j context_size: 1024 name: gpt4all-j-groovy parameters: model: ggml-gpt4all-j-v1.3-groovy.bin temperature: 0.2 top_k: 80 top_p: 0.7 template: chat: gpt4all-chat completion: gpt4all-completion main_gpu: "0"

I Tried open-llama-3b-q4_0

backend: llama context_size: 1024 name: openllama f16: true ## If you are using cpu set this to false gpu_layers: 4 batch: 512 parameters: model: open-llama-3b-q4_0.bin temperature: 0.2 top_k: 80 top_p: 0.7 template: chat: openllama-chat completion: openllama-completion roles: assistant: 'ASSISTANT:' system: 'SYSTEM:' user: 'USER:'

Content of openllama-chat.tmpl: Q: {{.Input}}\nA: Content of openllama-completion.tmpl: Q: Complete the following text: {{.Input}}\nA:

And error occured with log: 2023-09-27 06:35:37 11:35PM DBG Request received: 2023-09-27 06:35:37 11:35PM DBG Configuration read: &{PredictionOptions:{Model:open-llama-3b-q4_0.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:openllama F16:true Threads:32 Debug:true Roles:map[assistant:ASSISTANT: system:SYSTEM: user:USER:] Embeddings:false Backend:llama TemplateConfig:{Chat:openllama-chat ChatMessage: Completion:openllama-completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:4 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false DraftModel: NDraft:0 Quantization:} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}} 2023-09-27 06:35:37 11:35PM DBG Parameters: &{PredictionOptions:{Model:open-llama-3b-q4_0.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:openllama F16:true Threads:32 Debug:true Roles:map[assistant:ASSISTANT: system:SYSTEM: user:USER:] Embeddings:false Backend:llama TemplateConfig:{Chat:openllama-chat ChatMessage: Completion:openllama-completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:4 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false DraftModel: NDraft:0 Quantization:} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}} 2023-09-27 06:35:37 11:35PM DBG Prompt (before templating): USER: How are you? 2023-09-27 06:35:37 11:35PM DBG Template found, input modified to: Q: USER: How are you?\nA: 2023-09-27 06:35:37 2023-09-27 06:35:37 11:35PM DBG Prompt (after templating): Q: USER: How are you?\nA: 2023-09-27 06:35:37 2023-09-27 06:35:37 11:35PM DBG Loading model llama from open-llama-3b-q4_0.bin 2023-09-27 06:35:37 11:35PM DBG Loading model in memory from file: /models/open-llama-3b-q4_0.bin 2023-09-27 06:35:37 11:35PM DBG Loading GRPC Model llama: {backendString:llama model:open-llama-3b-q4_0.bin threads:32 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000503ba0 externalBackends:map[autogptq:/build/extra/grpc/autogptq/autogptq.py bark:/build/extra/grpc/bark/ttsbark.py diffusers:/build/extra/grpc/diffusers/backend_diffusers.py exllama:/build/extra/grpc/exllama/exllama.py huggingface-embeddings:/build/extra/grpc/huggingface/huggingface.py vall-e-x:/build/extra/grpc/vall-e-x/ttsvalle.py vllm:/build/extra/grpc/vllm/backend_vllm.py] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false} 2023-09-27 06:35:37 11:35PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama 2023-09-27 06:35:37 11:35PM DBG GRPC Service for open-llama-3b-q4_0.bin will be running at: '127.0.0.1:40367' 2023-09-27 06:35:37 11:35PM DBG GRPC Service state dir: /tmp/go-processmanager3853150060 2023-09-27 06:35:37 11:35PM DBG GRPC Service Started 2023-09-27 06:35:37 rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40367: connect: connection refused" 2023-09-27 06:35:37 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr 2023/09/26 23:35:37 gRPC Server listening at 127.0.0.1:40367 2023-09-27 06:35:39 11:35PM DBG GRPC Service Ready 2023-09-27 06:35:39 11:35PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:open-llama-3b-q4_0.bin ContextSize:1024 Seed:0 NBatch:512 F16Memory:true MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:4 MainGPU: TensorSplit: Threads:32 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/open-llama-3b-q4_0.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: NoMulMatQ:false DraftModel: AudioPath: Quantization:} 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr SIGILL: illegal instruction 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr PC=0x89fedc m=3 sigcode=2 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr signal arrived during cgo execution 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr instruction bytes: 0xc4 0xe3 0x7d 0x39 0x8c 0x24 0x18 0x3 0x0 0x0 0x1 0x66 0x89 0x84 0x24 0x0 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr goroutine 38 [syscall]: 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.cgocall(0x822db0, 0xc000341530) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000341508 sp=0xc0003414d0 pc=0x418c8b 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr github.com/go-skynet/go-llama%2ecpp._Cfunc_load_model(0x7f5b58000cd0, 0x400, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x4, 0x200, ...) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr _cgo_gotypes.go:267 +0x4f fp=0xc000341530 sp=0xc000341508 pc=0x81808f 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr github.com/go-skynet/go-llama%2ecpp.New({0xc0002280a0, 0x1e}, {0xc00022b600, 0x9, 0x9370e0?}) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /build/go-llama/llama.go:39 +0x385 fp=0xc000341740 sp=0xc000341530 pc=0x818a85 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr github.com/go-skynet/LocalAI/pkg/backend/llm/llama.(LLM).Load(0xc000012630, 0xc000300820) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /build/pkg/backend/llm/llama/llama.go:87 +0xc9c fp=0xc000341958 sp=0xc000341740 pc=0x81e11c 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr github.com/go-skynet/LocalAI/pkg/grpc.(server).LoadModel(0xc000036d90, {0xc000300820?, 0x50e946?}, 0x0?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /build/pkg/grpc/server.go:50 +0xe6 fp=0xc000341a08 sp=0xc000341958 pc=0x820e46 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr github.com/go-skynet/LocalAI/pkg/grpc/proto._Backend_LoadModel_Handler({0x9a95a0?, 0xc000036d90}, {0xa90270, 0xc00022c5d0}, 0xc0003460e0, 0x0) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /build/pkg/grpc/proto/backend_grpc.pb.go:264 +0x169 fp=0xc000341a60 sp=0xc000341a08 pc=0x80d4a9 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr google.golang.org/grpc.(Server).processUnaryRPC(0xc0001fc1e0, {0xa933f8, 0xc0003001a0}, 0xc000356000, 0xc0001fecc0, 0x1189570, 0x0) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:1376 +0xde7 fp=0xc000341e40 sp=0xc000341a60 pc=0x7f6767 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr google.golang.org/grpc.(Server).handleStream(0xc0001fc1e0, {0xa933f8, 0xc0003001a0}, 0xc000356000, 0x0) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:1753 +0x9e7 fp=0xc000341f68 sp=0xc000341e40 pc=0x7fb427 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr google.golang.org/grpc.(Server).serveStreams.func1.1() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:998 +0x8d fp=0xc000341fe0 sp=0xc000341f68 pc=0x7f450d 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.goexit() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000341fe8 sp=0xc000341fe0 pc=0x47bfc1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr created by google.golang.org/grpc.(Server).serveStreams.func1 in goroutine 37 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:996 +0x165 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr goroutine 1 [IO wait]: 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.gopark(0x42aea8?, 0x7f5b5ef00228?, 0x78?, 0xdb?, 0x4e847d?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0001edb08 sp=0xc0001edae8 pc=0x44d44e 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.netpollblock(0xc0001edb98?, 0x418426?, 0x0?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc0001edb40 sp=0xc0001edb08 pc=0x445ed7 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr internal/poll.runtime_pollWait(0x7f5b5ef82eb0, 0x72) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc0001edb60 sp=0xc0001edb40 pc=0x476ee5 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr internal/poll.(pollDesc).wait(0xc0001b8680?, 0x0?, 0x0) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0001edb88 sp=0xc0001edb60 pc=0x4e10e7 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr internal/poll.(pollDesc).waitRead(...) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr internal/poll.(FD).Accept(0xc0001b8680) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc0001edc30 sp=0xc0001edb88 pc=0x4e65cc 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr net.(netFD).accept(0xc0001b8680) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/net/fd_unix.go:172 +0x29 fp=0xc0001edce8 sp=0xc0001edc30 pc=0x644a69 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr net.(TCPListener).accept(0xc0000c04c0) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/net/tcpsock_posix.go:152 +0x1e fp=0xc0001edd10 sp=0xc0001edce8 pc=0x65ba1e 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr net.(TCPListener).Accept(0xc0000c04c0) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/net/tcpsock.go:315 +0x30 fp=0xc0001edd40 sp=0xc0001edd10 pc=0x65abd0 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr google.golang.org/grpc.(Server).Serve(0xc0001fc1e0, {0xa8f828?, 0xc0000c04c0}) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:859 +0x462 fp=0xc0001ede80 sp=0xc0001edd40 pc=0x7f31c2 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr github.com/go-skynet/LocalAI/pkg/grpc.StartServer({0x7ffdb27f5b6b?, 0xc000024160?}, {0xa93ee0?, 0xc000012630}) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /build/pkg/grpc/server.go:178 +0x17d fp=0xc0001edf10 sp=0xc0001ede80 pc=0x82283d 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr main.main() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /build/cmd/grpc/llama/main.go:22 +0x85 fp=0xc0001edf40 sp=0xc0001edf10 pc=0x8229e5 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.main() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc0001edfe0 sp=0xc0001edf40 pc=0x44cffb 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.goexit() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0001edfe8 sp=0xc0001edfe0 pc=0x47bfc1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr goroutine 2 [force gc (idle)]: 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0000a0fa8 sp=0xc0000a0f88 pc=0x44d44e 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.goparkunlock(...) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:404 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.forcegchelper() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc0000a0fe0 sp=0xc0000a0fa8 pc=0x44d2d3 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.goexit() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000a0fe8 sp=0xc0000a0fe0 pc=0x47bfc1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr created by runtime.init.6 in goroutine 1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:310 +0x1a 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr goroutine 3 [GC sweep wait]: 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0000a1778 sp=0xc0000a1758 pc=0x44d44e 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.goparkunlock(...) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:404 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.bgsweep(0x0?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/mgcsweep.go:280 +0x94 fp=0xc0000a17c8 sp=0xc0000a1778 pc=0x439354 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.gcenable.func1() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc0000a17e0 sp=0xc0000a17c8 pc=0x42e4e5 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.goexit() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000a17e8 sp=0xc0000a17e0 pc=0x47bfc1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr created by runtime.gcenable in goroutine 1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/mgc.go:200 +0x66 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr goroutine 4 [GC scavenge wait]: 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.gopark(0xc0000ca000?, 0xa88a70?, 0x1?, 0x0?, 0xc0000071e0?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0000a1f70 sp=0xc0000a1f50 pc=0x44d44e 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.goparkunlock(...) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:404 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.(scavengerState).park(0x11d2900) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc0000a1fa0 sp=0xc0000a1f70 pc=0x436be9 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.bgscavenge(0x0?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc0000a1fc8 sp=0xc0000a1fa0 pc=0x43717c 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.gcenable.func2() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc0000a1fe0 sp=0xc0000a1fc8 pc=0x42e485 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.goexit() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000a1fe8 sp=0xc0000a1fe0 pc=0x47bfc1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr created by runtime.gcenable in goroutine 1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/mgc.go:201 +0xa5 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr goroutine 5 [finalizer wait]: 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.gopark(0x9d39e0?, 0x10044e501?, 0x0?, 0x0?, 0x455605?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0000a0628 sp=0xc0000a0608 pc=0x44d44e 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.runfinq() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc0000a07e0 sp=0xc0000a0628 pc=0x42d567 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.goexit() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000a07e8 sp=0xc0000a07e0 pc=0x47bfc1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr created by runtime.createfing in goroutine 1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/mfinal.go:163 +0x3d 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr goroutine 35 [select]: 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.gopark(0xc00034ff00?, 0x2?, 0x0?, 0x0?, 0xc00034fecc?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00034fd78 sp=0xc00034fd58 pc=0x44d44e 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.selectgo(0xc00034ff00, 0xc00034fec8, 0xc00034fee8?, 0x0, 0x96f980?, 0x1) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc00034fe98 sp=0xc00034fd78 pc=0x45cea5 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr google.golang.org/grpc/internal/transport.(controlBuffer).get(0xc00031e050, 0x1) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/controlbuf.go:418 +0x113 fp=0xc00034ff30 sp=0xc00034fe98 pc=0x76c193 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr google.golang.org/grpc/internal/transport.(loopyWriter).run(0xc000346000) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/controlbuf.go:552 +0x86 fp=0xc00034ff90 sp=0xc00034ff30 pc=0x76c8c6 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func2() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_server.go:341 +0xd5 fp=0xc00034ffe0 sp=0xc00034ff90 pc=0x783835 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.goexit() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00034ffe8 sp=0xc00034ffe0 pc=0x47bfc1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 34 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_server.go:338 +0x1b0c 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr goroutine 36 [select]: 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.gopark(0xc000306f70?, 0x4?, 0xe0?, 0x5?, 0xc000306ec0?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000306d28 sp=0xc000306d08 pc=0x44d44e 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.selectgo(0xc000306f70, 0xc000306eb8, 0x0?, 0x0, 0x0?, 0x1) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000306e48 sp=0xc000306d28 pc=0x45cea5 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr google.golang.org/grpc/internal/transport.(http2Server).keepalive(0xc0003001a0) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_server.go:1155 +0x225 fp=0xc000306fc8 sp=0xc000306e48 pc=0x78ac85 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func4() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_server.go:344 +0x25 fp=0xc000306fe0 sp=0xc000306fc8 pc=0x783725 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.goexit() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000306fe8 sp=0xc000306fe0 pc=0x47bfc1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 34 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_server.go:344 +0x1b4e 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr goroutine 37 [IO wait]: 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.gopark(0x11eaa60?, 0xb?, 0x0?, 0x0?, 0x6?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00030eaa8 sp=0xc00030ea88 pc=0x44d44e 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.netpollblock(0x4c6378?, 0x418426?, 0x0?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc00030eae0 sp=0xc00030eaa8 pc=0x445ed7 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr internal/poll.runtime_pollWait(0x7f5b5ef82db8, 0x72) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc00030eb00 sp=0xc00030eae0 pc=0x476ee5 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr internal/poll.(pollDesc).wait(0xc00022a000?, 0xc000316000?, 0x0) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00030eb28 sp=0xc00030eb00 pc=0x4e10e7 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr internal/poll.(pollDesc).waitRead(...) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr internal/poll.(FD).Read(0xc00022a000, {0xc000316000, 0x8000, 0x8000}) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc00030ebc0 sp=0xc00030eb28 pc=0x4e23da 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr net.(netFD).Read(0xc00022a000, {0xc000316000?, 0x1060100000000?, 0x8?}) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc00030ec08 sp=0xc00030ebc0 pc=0x642a45 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr net.(conn).Read(0xc00022e000, {0xc000316000?, 0x0?, 0xc00030ecd8?}) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/net/net.go:179 +0x45 fp=0xc00030ec50 sp=0xc00030ec08 pc=0x653145 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr net.(TCPConn).Read(0x0?, {0xc000316000?, 0xc00030eca8?, 0x46b32d?}) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr :1 +0x25 fp=0xc00030ec80 sp=0xc00030ec50 pc=0x6658e5 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr bufio.(Reader).Read(0xc000314000, {0xc000328040, 0x9, 0xc13cf892d832e0b5?}) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/bufio/bufio.go:244 +0x197 fp=0xc00030ecb8 sp=0xc00030ec80 pc=0x5bdf17 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr io.ReadAtLeast({0xa8d2e0, 0xc000314000}, {0xc000328040, 0x9, 0x9}, 0x9) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/io/io.go:335 +0x90 fp=0xc00030ed00 sp=0xc00030ecb8 pc=0x4c0570 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr io.ReadFull(...) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/io/io.go:354 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr golang.org/x/net/http2.readFrameHeader({0xc000328040, 0x9, 0xc00029c048?}, {0xa8d2e0?, 0xc000314000?}) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/golang.org/x/net@v0.14.0/http2/frame.go:237 +0x65 fp=0xc00030ed50 sp=0xc00030ed00 pc=0x758f25 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr golang.org/x/net/http2.(Framer).ReadFrame(0xc000328000) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/golang.org/x/net@v0.14.0/http2/frame.go:498 +0x85 fp=0xc00030edf8 sp=0xc00030ed50 pc=0x759665 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr google.golang.org/grpc/internal/transport.(http2Server).HandleStreams(0xc0003001a0, 0x0?, 0x0?) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_server.go:642 +0x165 fp=0xc00030ef10 sp=0xc00030edf8 pc=0x786aa5 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr google.golang.org/grpc.(Server).serveStreams(0xc0001fc1e0, {0xa933f8?, 0xc0003001a0}) 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:985 +0x149 fp=0xc00030ef80 sp=0xc00030ef10 pc=0x7f4289 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr google.golang.org/grpc.(Server).handleRawConn.func1() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:927 +0x45 fp=0xc00030efe0 sp=0xc00030ef80 pc=0x7f3b65 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr runtime.goexit() 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00030efe8 sp=0xc00030efe0 pc=0x47bfc1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr created by google.golang.org/grpc.(*Server).handleRawConn in goroutine 34 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:926 +0x185 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr rax 0x0 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr rbx 0xab7620 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr rcx 0x7f5b650341a0 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr rdx 0x7f5bd563d6d8 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr rdi 0x7f5bd563d6c8 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr rsi 0x7f5bd5635e38 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr rbp 0x7f5b650342c0 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr rsp 0x7f5b65033f40 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr r8 0x0 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr r9 0x7f5b58000080 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr r10 0xfffffffffffffaac 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr r11 0x7f5bd5540990 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr r12 0x1 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr r13 0x7f5b65034060 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr r14 0x7f5b65033ff0 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr r15 0x7f5b65034160 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr rip 0x89fedc 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr rflags 0x10246 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr cs 0x33 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr fs 0x0 2023-09-27 06:35:39 11:35PM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:40367): stderr gs 0x0 2023-09-27 06:35:39 [172.18.0.1]:40862 500 - POST /v1/chat/completions 2023-09-27 06:35:42 [127.0.0.1]:46062 200 - GET /readyz

lunamidori5 commented 11 months ago

@noblerboy2004 You have GPU layers set to 0, So 0% of your GPU will be used... Here is a fixed yaml for your easy copy and paste make sure to RESTART localai after changing a yaml file

backend: llama-stable
context_size: 1024
name: openllama
f16: true 
gpu_layers: 30
parameters:
model: open-llama-3b-q4_0.bin
temperature: 0.2
top_k: 80
top_p: 0.7
template:
chat: openllama-chat
completion: openllama-completion
roles:
assistant: 'ASSISTANT:'
system: 'SYSTEM:'
user: 'USER:'
backend: llama-stable
context_size: 1024
f16: true 
gpu_layers: 30
name: gpt4all-j-groovy
parameters:
model: ggml-gpt4all-j-v1.3-groovy.bin
temperature: 0.2
top_k: 80
top_p: 0.7
template:
chat: gpt4all-chat
completion: gpt4all-completion

You will need to fix the formatting of the yaml files before restarting localai

lunamidori5 commented 11 months ago

As a note, gpt4all is not fully supported at this time, and the open-llama model uses llama-stable not llama If you would like more info on setting up a model

https://localai.io/howtos/ https://localai.io/howtos/easy-model-import-downloaded/ https://localai.io/advanced/

noblerboy2004 commented 11 months ago
backend: llama-stable
context_size: 1024
name: openllama
f16: true 
gpu_layers: 30
parameters:
model: open-llama-3b-q4_0.bin
temperature: 0.2
top_k: 80
top_p: 0.7
template:
chat: openllama-chat
completion: openllama-completion
roles:
assistant: 'ASSISTANT:'
system: 'SYSTEM:'
user: 'USER:'

Hi luminadori5,

Thank you for your quick action. I did above guide. However, still failed with openllama with log: 2023-09-27 07:04:18 [127.0.0.1]:59474 200 - GET /readyz 2023-09-27 07:04:20 12:04AM DBG Request received: 2023-09-27 07:04:20 12:04AM DBG Configuration read: &{PredictionOptions:{Model:open-llama-3b-q4_0.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:openllama F16:true Threads:32 Debug:true Roles:map[assistant:ASSISTANT: system:SYSTEM: user:USER:] Embeddings:false Backend:llama TemplateConfig:{Chat:openllama-chat ChatMessage: Completion:openllama-completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:1000 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false DraftModel: NDraft:0 Quantization:} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}} 2023-09-27 07:04:20 12:04AM DBG Parameters: &{PredictionOptions:{Model:open-llama-3b-q4_0.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:openllama F16:true Threads:32 Debug:true Roles:map[assistant:ASSISTANT: system:SYSTEM: user:USER:] Embeddings:false Backend:llama TemplateConfig:{Chat:openllama-chat ChatMessage: Completion:openllama-completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:1000 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false DraftModel: NDraft:0 Quantization:} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}} 2023-09-27 07:04:20 12:04AM DBG Prompt (before templating): USER: How are you? 2023-09-27 07:04:20 12:04AM DBG Template found, input modified to: Q: USER: How are you?\nA: 2023-09-27 07:04:20 2023-09-27 07:04:20 12:04AM DBG Prompt (after templating): Q: USER: How are you?\nA: 2023-09-27 07:04:20 2023-09-27 07:04:20 12:04AM DBG Loading model llama from open-llama-3b-q4_0.bin 2023-09-27 07:04:20 12:04AM DBG Loading model in memory from file: /models/open-llama-3b-q4_0.bin 2023-09-27 07:04:20 12:04AM DBG Loading GRPC Model llama: {backendString:llama model:open-llama-3b-q4_0.bin threads:32 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000681040 externalBackends:map[autogptq:/build/extra/grpc/autogptq/autogptq.py bark:/build/extra/grpc/bark/ttsbark.py diffusers:/build/extra/grpc/diffusers/backend_diffusers.py exllama:/build/extra/grpc/exllama/exllama.py huggingface-embeddings:/build/extra/grpc/huggingface/huggingface.py vall-e-x:/build/extra/grpc/vall-e-x/ttsvalle.py vllm:/build/extra/grpc/vllm/backend_vllm.py] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false} 2023-09-27 07:04:20 12:04AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama 2023-09-27 07:04:20 12:04AM DBG GRPC Service for open-llama-3b-q4_0.bin will be running at: '127.0.0.1:34155' 2023-09-27 07:04:20 12:04AM DBG GRPC Service state dir: /tmp/go-processmanager1578684964 2023-09-27 07:04:20 12:04AM DBG GRPC Service Started 2023-09-27 07:04:20 rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34155: connect: connection refused" 2023-09-27 07:04:21 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr 2023/09/27 00:04:21 gRPC Server listening at 127.0.0.1:34155 2023-09-27 07:04:22 12:04AM DBG GRPC Service Ready 2023-09-27 07:04:22 12:04AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:open-llama-3b-q4_0.bin ContextSize:1024 Seed:0 NBatch:512 F16Memory:true MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:1000 MainGPU: TensorSplit: Threads:32 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/open-llama-3b-q4_0.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: NoMulMatQ:false DraftModel: AudioPath: Quantization:} 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr SIGILL: illegal instruction 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr PC=0x89fedc m=3 sigcode=2 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr signal arrived during cgo execution 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr instruction bytes: 0xc4 0xe3 0x7d 0x39 0x8c 0x24 0x18 0x3 0x0 0x0 0x1 0x66 0x89 0x84 0x24 0x0 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr goroutine 22 [syscall]: 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.cgocall(0x822db0, 0xc000195530) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000195508 sp=0xc0001954d0 pc=0x418c8b 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr github.com/go-skynet/go-llama%2ecpp._Cfunc_load_model(0x7f3428000cd0, 0x400, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x3e8, 0x200, ...) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr _cgo_gotypes.go:267 +0x4f fp=0xc000195530 sp=0xc000195508 pc=0x81808f 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr github.com/go-skynet/go-llama%2ecpp.New({0xc0001200c0, 0x1e}, {0xc00012f600, 0x9, 0x9370e0?}) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /build/go-llama/llama.go:39 +0x385 fp=0xc000195740 sp=0xc000195530 pc=0x818a85 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr github.com/go-skynet/LocalAI/pkg/backend/llm/llama.(LLM).Load(0xc000012630, 0xc0001029c0) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /build/pkg/backend/llm/llama/llama.go:87 +0xc9c fp=0xc000195958 sp=0xc000195740 pc=0x81e11c 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr github.com/go-skynet/LocalAI/pkg/grpc.(server).LoadModel(0xc000036d90, {0xc0001029c0?, 0x50e946?}, 0x0?) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /build/pkg/grpc/server.go:50 +0xe6 fp=0xc000195a08 sp=0xc000195958 pc=0x820e46 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr github.com/go-skynet/LocalAI/pkg/grpc/proto._Backend_LoadModel_Handler({0x9a95a0?, 0xc000036d90}, {0xa90270, 0xc00012a600}, 0xc00011e150, 0x0) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /build/pkg/grpc/proto/backend_grpc.pb.go:264 +0x169 fp=0xc000195a60 sp=0xc000195a08 pc=0x80d4a9 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr google.golang.org/grpc.(Server).processUnaryRPC(0xc0001fc1e0, {0xa933f8, 0xc000102340}, 0xc000152000, 0xc0001fecc0, 0x1189570, 0x0) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:1376 +0xde7 fp=0xc000195e40 sp=0xc000195a60 pc=0x7f6767 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr google.golang.org/grpc.(Server).handleStream(0xc0001fc1e0, {0xa933f8, 0xc000102340}, 0xc000152000, 0x0) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:1753 +0x9e7 fp=0xc000195f68 sp=0xc000195e40 pc=0x7fb427 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr google.golang.org/grpc.(Server).serveStreams.func1.1() 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:998 +0x8d fp=0xc000195fe0 sp=0xc000195f68 pc=0x7f450d 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.goexit() 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000195fe8 sp=0xc000195fe0 pc=0x47bfc1 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr created by google.golang.org/grpc.(Server).serveStreams.func1 in goroutine 21 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:996 +0x165 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr goroutine 1 [IO wait]: 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.gopark(0x4c80f0?, 0xc0001edb28?, 0x78?, 0xdb?, 0x4e847d?) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0001edb08 sp=0xc0001edae8 pc=0x44d44e 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.netpollblock(0x47a032?, 0x418426?, 0x0?) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc0001edb40 sp=0xc0001edb08 pc=0x445ed7 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr internal/poll.runtime_pollWait(0x7f343829feb0, 0x72) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc0001edb60 sp=0xc0001edb40 pc=0x476ee5 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr internal/poll.(pollDesc).wait(0xc0001b8680?, 0x4?, 0x0) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0001edb88 sp=0xc0001edb60 pc=0x4e10e7 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr internal/poll.(pollDesc).waitRead(...) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr internal/poll.(FD).Accept(0xc0001b8680) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc0001edc30 sp=0xc0001edb88 pc=0x4e65cc 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr net.(netFD).accept(0xc0001b8680) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/net/fd_unix.go:172 +0x29 fp=0xc0001edce8 sp=0xc0001edc30 pc=0x644a69 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr net.(TCPListener).accept(0xc0000c04c0) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/net/tcpsock_posix.go:152 +0x1e fp=0xc0001edd10 sp=0xc0001edce8 pc=0x65ba1e 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr net.(TCPListener).Accept(0xc0000c04c0) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/net/tcpsock.go:315 +0x30 fp=0xc0001edd40 sp=0xc0001edd10 pc=0x65abd0 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr google.golang.org/grpc.(Server).Serve(0xc0001fc1e0, {0xa8f828?, 0xc0000c04c0}) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:859 +0x462 fp=0xc0001ede80 sp=0xc0001edd40 pc=0x7f31c2 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr github.com/go-skynet/LocalAI/pkg/grpc.StartServer({0x7ffd405ffb6b?, 0xc000024160?}, {0xa93ee0?, 0xc000012630}) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /build/pkg/grpc/server.go:178 +0x17d fp=0xc0001edf10 sp=0xc0001ede80 pc=0x82283d 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr main.main() 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /build/cmd/grpc/llama/main.go:22 +0x85 fp=0xc0001edf40 sp=0xc0001edf10 pc=0x8229e5 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.main() 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc0001edfe0 sp=0xc0001edf40 pc=0x44cffb 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.goexit() 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0001edfe8 sp=0xc0001edfe0 pc=0x47bfc1 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr goroutine 2 [force gc (idle)]: 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0000a0fa8 sp=0xc0000a0f88 pc=0x44d44e 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.goparkunlock(...) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:404 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.forcegchelper() 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc0000a0fe0 sp=0xc0000a0fa8 pc=0x44d2d3 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.goexit() 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000a0fe8 sp=0xc0000a0fe0 pc=0x47bfc1 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr created by runtime.init.6 in goroutine 1 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:310 +0x1a 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr goroutine 3 [GC sweep wait]: 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0000a1778 sp=0xc0000a1758 pc=0x44d44e 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.goparkunlock(...) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:404 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.bgsweep(0x0?) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/mgcsweep.go:280 +0x94 fp=0xc0000a17c8 sp=0xc0000a1778 pc=0x439354 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.gcenable.func1() 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc0000a17e0 sp=0xc0000a17c8 pc=0x42e4e5 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.goexit() 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000a17e8 sp=0xc0000a17e0 pc=0x47bfc1 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr created by runtime.gcenable in goroutine 1 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/mgc.go:200 +0x66 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr goroutine 4 [GC scavenge wait]: 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.gopark(0xc0000ca000?, 0xa88a70?, 0x1?, 0x0?, 0xc0000071e0?) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0000a1f70 sp=0xc0000a1f50 pc=0x44d44e 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.goparkunlock(...) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:404 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.(scavengerState).park(0x11d2900) 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc0000a1fa0 sp=0xc0000a1f70 pc=0x436be9 2023-09-27 07:04:22 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.bgscavenge(0x0?) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc0000a1fc8 sp=0xc0000a1fa0 pc=0x43717c 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.gcenable.func2() 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc0000a1fe0 sp=0xc0000a1fc8 pc=0x42e485 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.goexit() 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000a1fe8 sp=0xc0000a1fe0 pc=0x47bfc1 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr created by runtime.gcenable in goroutine 1 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/mgc.go:201 +0xa5 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr goroutine 5 [finalizer wait]: 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.gopark(0x9d39e0?, 0x10044e501?, 0x0?, 0x0?, 0x455605?) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0000a0628 sp=0xc0000a0608 pc=0x44d44e 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.runfinq() 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc0000a07e0 sp=0xc0000a0628 pc=0x42d567 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.goexit() 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000a07e8 sp=0xc0000a07e0 pc=0x47bfc1 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr created by runtime.createfing in goroutine 1 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/mfinal.go:163 +0x3d 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr goroutine 19 [select]: 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.gopark(0xc00014ff00?, 0x2?, 0x0?, 0x0?, 0xc00014fecc?) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00014fd78 sp=0xc00014fd58 pc=0x44d44e 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.selectgo(0xc00014ff00, 0xc00014fec8, 0xc00014fee8?, 0x0, 0x96f980?, 0x1) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc00014fe98 sp=0xc00014fd78 pc=0x45cea5 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr google.golang.org/grpc/internal/transport.(controlBuffer).get(0xc0001141e0, 0x1) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/controlbuf.go:418 +0x113 fp=0xc00014ff30 sp=0xc00014fe98 pc=0x76c193 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr google.golang.org/grpc/internal/transport.(loopyWriter).run(0xc00011e070) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/controlbuf.go:552 +0x86 fp=0xc00014ff90 sp=0xc00014ff30 pc=0x76c8c6 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func2() 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_server.go:341 +0xd5 fp=0xc00014ffe0 sp=0xc00014ff90 pc=0x783835 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.goexit() 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00014ffe8 sp=0xc00014ffe0 pc=0x47bfc1 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 18 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_server.go:338 +0x1b0c 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr goroutine 20 [select]: 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.gopark(0xc00009c770?, 0x4?, 0xe0?, 0x6?, 0xc00009c6c0?) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00009c528 sp=0xc00009c508 pc=0x44d44e 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.selectgo(0xc00009c770, 0xc00009c6b8, 0x0?, 0x0, 0x0?, 0x1) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc00009c648 sp=0xc00009c528 pc=0x45cea5 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr google.golang.org/grpc/internal/transport.(http2Server).keepalive(0xc000102340) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_server.go:1155 +0x225 fp=0xc00009c7c8 sp=0xc00009c648 pc=0x78ac85 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func4() 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_server.go:344 +0x25 fp=0xc00009c7e0 sp=0xc00009c7c8 pc=0x783725 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.goexit() 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00009c7e8 sp=0xc00009c7e0 pc=0x47bfc1 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 18 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_server.go:344 +0x1b4e 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr goroutine 21 [IO wait]: 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.gopark(0x11eaa60?, 0xb?, 0x0?, 0x0?, 0x6?) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0000b1aa8 sp=0xc0000b1a88 pc=0x44d44e 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.netpollblock(0x4c6378?, 0x418426?, 0x0?) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc0000b1ae0 sp=0xc0000b1aa8 pc=0x445ed7 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr internal/poll.runtime_pollWait(0x7f343829fdb8, 0x72) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc0000b1b00 sp=0xc0000b1ae0 pc=0x476ee5 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr internal/poll.(pollDesc).wait(0xc00012e000?, 0xc000130000?, 0x0) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000b1b28 sp=0xc0000b1b00 pc=0x4e10e7 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr internal/poll.(pollDesc).waitRead(...) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr internal/poll.(FD).Read(0xc00012e000, {0xc000130000, 0x8000, 0x8000}) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc0000b1bc0 sp=0xc0000b1b28 pc=0x4e23da 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr net.(netFD).Read(0xc00012e000, {0xc000130000?, 0x1060100000000?, 0x8?}) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc0000b1c08 sp=0xc0000b1bc0 pc=0x642a45 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr net.(conn).Read(0xc000116008, {0xc000130000?, 0x0?, 0xc0000b1cd8?}) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/net/net.go:179 +0x45 fp=0xc0000b1c50 sp=0xc0000b1c08 pc=0x653145 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr net.(TCPConn).Read(0x0?, {0xc000130000?, 0xc0000b1ca8?, 0x46b32d?}) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr :1 +0x25 fp=0xc0000b1c80 sp=0xc0000b1c50 pc=0x6658e5 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr bufio.(Reader).Read(0xc0001102a0, {0xc000140040, 0x9, 0xc13cfa41baf16b48?}) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/bufio/bufio.go:244 +0x197 fp=0xc0000b1cb8 sp=0xc0000b1c80 pc=0x5bdf17 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr io.ReadAtLeast({0xa8d2e0, 0xc0001102a0}, {0xc000140040, 0x9, 0x9}, 0x9) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/io/io.go:335 +0x90 fp=0xc0000b1d00 sp=0xc0000b1cb8 pc=0x4c0570 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr io.ReadFull(...) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/io/io.go:354 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr golang.org/x/net/http2.readFrameHeader({0xc000140040, 0x9, 0xc00028e000?}, {0xa8d2e0?, 0xc0001102a0?}) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/golang.org/x/net@v0.14.0/http2/frame.go:237 +0x65 fp=0xc0000b1d50 sp=0xc0000b1d00 pc=0x758f25 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr golang.org/x/net/http2.(Framer).ReadFrame(0xc000140000) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/golang.org/x/net@v0.14.0/http2/frame.go:498 +0x85 fp=0xc0000b1df8 sp=0xc0000b1d50 pc=0x759665 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr google.golang.org/grpc/internal/transport.(http2Server).HandleStreams(0xc000102340, 0x0?, 0x0?) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_server.go:642 +0x165 fp=0xc0000b1f10 sp=0xc0000b1df8 pc=0x786aa5 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr google.golang.org/grpc.(Server).serveStreams(0xc0001fc1e0, {0xa933f8?, 0xc000102340}) 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:985 +0x149 fp=0xc0000b1f80 sp=0xc0000b1f10 pc=0x7f4289 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr google.golang.org/grpc.(Server).handleRawConn.func1() 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:927 +0x45 fp=0xc0000b1fe0 sp=0xc0000b1f80 pc=0x7f3b65 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr runtime.goexit() 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000b1fe8 sp=0xc0000b1fe0 pc=0x47bfc1 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr created by google.golang.org/grpc.(*Server).handleRawConn in goroutine 18 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr /go/pkg/mod/google.golang.org/grpc@v1.58.2/server.go:926 +0x185 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr rax 0x0 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr rbx 0xab7620 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr rcx 0x7f3438b111a0 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr rdx 0x7f34a911a6d8 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr rdi 0x7f34a911a6c8 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr rsi 0x7f34a9112e38 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr rbp 0x7f3438b112c0 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr rsp 0x7f3438b10f40 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr r8 0x0 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr r9 0x7f3428000080 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr r10 0xfffffffffffffaac 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr r11 0x7f34a901d990 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr r12 0x1 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr r13 0x7f3438b11060 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr r14 0x7f3438b10ff0 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr r15 0x7f3438b11160 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr rip 0x89fedc 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr rflags 0x10246 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr cs 0x33 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr fs 0x0 2023-09-27 07:04:23 12:04AM DBG GRPC(open-llama-3b-q4_0.bin-127.0.0.1:34155): stderr gs 0x0 2023-09-27 07:04:23 [172.18.0.1]:59482 500 - POST /v1/chat/completions

For gpt4all-j-groovy, when changing backend to llama-stable, get similar problem above.

THank you.

noblerboy2004 commented 11 months ago

When i tried to set gpu-layer for gpt4all-j-groovy with gpt4all backend. image

LocalAi use CPU instead of GPU. image

noblerboy2004 commented 11 months ago

I am new to this project, too. It looks like you need to set up gpu_layer in the config somewhere, but I don't know how.

Hi Lunamidori5,yhyu13,

I tried with the following steps below and working now for lunademo. CUDA Not working (Note: when docker compose again, all previous install will be erased)

Follow the link: https://localai.io/howtos/easy-model-import-downloaded/

  1. Download model: https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGML/blob/main/luna-ai-llama2-uncensored.ggmlv3.q5_K_M.bin

  2. Create 3 file In the "lunademo-chat.tmpl" file add {{.Input}} ASSISTANT: In the "lunademo-completion.tmpl" file add Complete the following sentence: {{.Input}} In the "lunademo.yaml" file (If you want to see advanced yaml configs - Link) backend: llama-stable context_size: 2000 f16: true ## If you are using cpu set this to false gpu_layers: 30 batch: 512 name: lunademo parameters: model: luna-ai-llama2-uncensored.ggmlv3.q5_K_M.bin temperature: 0.2 top_k: 40 top_p: 0.65 roles: assistant: 'ASSISTANT:' system: 'SYSTEM:' user: 'USER:' template: chat: lunademo-chat completion: lunademo-completion

  3. Edit .env file: Notice to remove -DLLAMA_AVX=OFF out of the string CMAKE. Because our CPU support AVX (not support AVX2, AVX512)

    CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF -DLLAMA_F16C=OFF" CUDA_VISIBLE_DEVICES=0-1 CUDA_DEVICE_POOL_GPU_OVERRIDE=1

    Set number of threads.

    Note: prefer the number of physical cores. Overbooking the CPU degrades performance notably.

    THREADS=32

    Specify a different bind address (defaults to ":8080")

    ADDRESS=127.0.0.1:8080

    Default models context size

    CONTEXT_SIZE=512

    #

    Define galleries.

    models will to install will be visible in /models/available

    GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]

    CORS settings

    CORS=true

    CORS_ALLOW_ORIGINS=*

    Default path for models

    # MODELS_PATH=/models

    Enable debug mode

    DEBUG=true

    Disables COMPEL (Diffusers)

    COMPEL=0

    Enable/Disable single backend (useful if only one GPU is available)

    SINGLE_ACTIVE_BACKEND=true

    Specify a build type. Available: cublas, openblas, clblas.

    cuBLAS: This is a GPU-accelerated version of the complete standard BLAS (Basic Linear Algebra Subprograms) library. It's provided by Nvidia and is part of their CUDA toolkit.

    OpenBLAS: This is an open-source implementation of the BLAS library that aims to provide highly optimized code for various platforms. It includes support for multi-threading and can be compiled to use hardware-specific features for additional performance. OpenBLAS can run on many kinds of hardware, including CPUs from Intel, AMD, and ARM.

    clBLAS: This is an open-source implementation of the BLAS library that uses OpenCL, a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. clBLAS is designed to take advantage of the parallel computing power of GPUs but can also run on any hardware that supports OpenCL. This includes hardware from different vendors like Nvidia, AMD, and Intel.

    BUILD_TYPE=cublas

    Uncomment and set to true to enable rebuilding from source

    REBUILD=true

    Enable go tags, available: stablediffusion, tts

    stablediffusion: image generation with stablediffusion

    tts: enables text-to-speech with go-piper

    (requires REBUILD=true)

    #

    GO_TAGS=stablediffusion

    Path where to store generated images

    IMAGE_PATH=/tmp

    Specify a default upload limit in MB (whisper)

    UPLOAD_LIMIT

    List of external GRPC backends (note on the container image this variable is already set to use extra backends available in extra/)

    EXTERNAL_GRPC_BACKENDS=my-backend:127.0.0.1:9000,my-backend2:/usr/bin/backend.py

    Advanced settings

    Those are not really used by LocalAI, but from components in the stack

    Preload libraries

    LD_PRELOAD=

    Huggingface cache for models

    HUGGINGFACE_HUB_CACHE=/usr/local/huggingface

    Python backends GRPC max workers

    Default number of workers for GRPC Python backends.

    This actually controls wether a backend can process multiple requests or not.

    PYTHON_GRPC_MAX_WORKERS=1

  4. Edit dockercompose file version: '3.6'

    services: api: deploy: resources: reservations: devices:

    • driver: nvidia count: 1 capabilities: [gpu] image: quay.io/go-skynet/local-ai:master-cublas-cuda12 tty: true # enable colorized logs restart: always # should this be on-failure ? ports:
      • 8080:8080 env_file:
      • .env volumes:
      • ./models:/models command: ["/usr/bin/local-ai" ]
        1. Run command: docker-compose up -d --pull always

Hope that helpfull.

Thank you and have a nice day.

qingfenghcy commented 5 months ago

Hello, I also had a problem when using gpu version. Have you solved your problem?

longunmin commented 3 months ago

I am having the same issue, despite having gpu set in docker-compose and setting gpu_layers in the yaml. Could it be a docker issue?

noblerboy2004 commented 2 weeks ago

Hello, I also had a problem when using gpu version. Have you solved your problem?

yes. The comment above show the way fixing my problem. image