mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
23.2k stars 1.76k forks source link

[LocalAI + K8sGPT] GRPC connection error and response failure ( impacting all users ) #819

Closed AlexsJones closed 1 year ago

AlexsJones commented 1 year ago

LocalAI version: latest

Environment, CPU architecture, OS, and Version: amd64 thinkpad + kind

Describe the bug We can see localai receives the prompts buts fails to respond to the request

To Reproduce

  1. Install K8sGPT
  2. k8sgpt auth add -b localai -m ggml-gpt4all-j -u http://localhost:8080/v1 -b localai
  3. Install local AI in cluster with values e.g. https://github.com/eyalsofer/k8sgpt-localai/blob/main/values.yaml
  4. Proxy localAI to k8sgpt locally
  5. Run analyze k8sgpt analyze --no-cache=true --explain -b localai

Expected behavior The expectation is the requests do not time out but return the inferenceAPI response.

Logs

local-ai-966dbbfff-khf5v local-ai 10:56AM DBG Downloading ggml-gpt4all-j: 3.5 GiB/3.5 GiB (99.62%) ETA: 7.836537839s
local-ai-966dbbfff-khf5v local-ai 10:56AM DBG File "ggml-gpt4all-j" downloaded and verified
local-ai-966dbbfff-khf5v local-ai 10:56AM DBG Prompt template "gpt4all-completion" written
local-ai-966dbbfff-khf5v local-ai 10:56AM DBG Prompt template "gpt4all-chat" written
local-ai-966dbbfff-khf5v local-ai 10:56AM DBG Written config file /models/gpt4all-j.yaml
local-ai-966dbbfff-khf5v local-ai 
local-ai-966dbbfff-khf5v local-ai  ┌───────────────────────────────────────────────────┐ 
local-ai-966dbbfff-khf5v local-ai  │                   Fiber v2.48.0                   │ 
local-ai-966dbbfff-khf5v local-ai  │               http://127.0.0.1:8080               │ 
local-ai-966dbbfff-khf5v local-ai  │       (bound on host 0.0.0.0 and port 8080)       │ 
local-ai-966dbbfff-khf5v local-ai  │                                                   │ 
local-ai-966dbbfff-khf5v local-ai  │ Handlers ............ 32  Processes ........... 1 │ 
local-ai-966dbbfff-khf5v local-ai  │ Prefork ....... Disabled  PID ................. 7 │ 
local-ai-966dbbfff-khf5v local-ai  └───────────────────────────────────────────────────┘ 
local-ai-966dbbfff-khf5v local-ai 
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Request received: {"model":"ggml-gpt4all-j","language":"","n":0,"top_p":0,"top_k":0,"temperature":0,"max_tokens":0,"echo":false,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"frequency_penalty":0,"tfz":0,"typical_p":0,"seed":0,"file":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"Simplify the following Kubernetes error message delimited by triple dashes written in --- english --- language; --- Service has no endpoints, expected label app.kubernetes.io/component=webhook Service has no endpoints, expected label app.kubernetes.io/instance=cert-manager Service has no endpoints, expected label app.kubernetes.io/name=webhook ---.\n\tProvide the most possible solution in a step by step style in no more than 280 characters. Write the output in the following format:\n\tError: {Explain error here}\n\tSolution: {Step by step solution here}\n\t"}],"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null}
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Configuration read: &{PredictionOptions:{Model:ggml-gpt4all-j Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 F16:false NUMA:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false Grammar: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} SystemPrompt:}
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Parameters: &{PredictionOptions:{Model:ggml-gpt4all-j Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 F16:false NUMA:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false Grammar: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} SystemPrompt:}
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Prompt (before templating): Simplify the following Kubernetes error message delimited by triple dashes written in --- english --- language; --- Service has no endpoints, expected label app.kubernetes.io/component=webhook Service has no endpoints, expected label app.kubernetes.io/instance=cert-manager Service has no endpoints, expected label app.kubernetes.io/name=webhook ---.
local-ai-966dbbfff-khf5v local-ai   Provide the most possible solution in a step by step style in no more than 280 characters. Write the output in the following format:
local-ai-966dbbfff-khf5v local-ai   Error: {Explain error here}
local-ai-966dbbfff-khf5v local-ai   Solution: {Step by step solution here}
local-ai-966dbbfff-khf5v local-ai   
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Template found, input modified to: The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.
local-ai-966dbbfff-khf5v local-ai ### Prompt:
local-ai-966dbbfff-khf5v local-ai Simplify the following Kubernetes error message delimited by triple dashes written in --- english --- language; --- Service has no endpoints, expected label app.kubernetes.io/component=webhook Service has no endpoints, expected label app.kubernetes.io/instance=cert-manager Service has no endpoints, expected label app.kubernetes.io/name=webhook ---.
local-ai-966dbbfff-khf5v local-ai   Provide the most possible solution in a step by step style in no more than 280 characters. Write the output in the following format:
local-ai-966dbbfff-khf5v local-ai   Error: {Explain error here}
local-ai-966dbbfff-khf5v local-ai   Solution: {Step by step solution here}
local-ai-966dbbfff-khf5v local-ai   
local-ai-966dbbfff-khf5v local-ai ### Response:# Models to download at runtime
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Prompt (after templating): The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.
local-ai-966dbbfff-khf5v local-ai ### Prompt:
local-ai-966dbbfff-khf5v local-ai Simplify the following Kubernetes error message delimited by triple dashes written in --- english --- language; --- Service has no endpoints, expected label app.kubernetes.io/component=webhook Service has no endpoints, expected label app.kubernetes.io/instance=cert-manager Service has no endpoints, expected label app.kubernetes.io/name=webhook ---.
local-ai-966dbbfff-khf5v local-ai   Provide the most possible solution in a step by step style in no more than 280 characters. Write the output in the following format:
local-ai-966dbbfff-khf5v local-ai   Error: {Explain error here}
local-ai-966dbbfff-khf5v local-ai   Solution: {Step by step solution here}
local-ai-966dbbfff-khf5v local-ai   
local-ai-966dbbfff-khf5v local-ai ### Response:# Models to download at runtime
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Loading model 'ggml-gpt4all-j' greedly from all the available backends: llama, gpt4all, falcon, gptneox, bert-embeddings, llama-grammar, falcon-ggml, gptj, gpt2, dolly, mpt, replit, starcoder, bloomz, rwkv, whisper, stablediffusion, piper, /build/extra/grpc/huggingface/huggingface.py
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG [llama] Attempting to load
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Loading model llama from ggml-gpt4all-j
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Loading model in memory from file: /models/ggml-gpt4all-j
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Loading GRPC Model llama: {backendString:llama modelFile:ggml-gpt4all-j threads:4 assetDir:/tmp/localai/backend_data context:0xc0000c4000 gRPCOptions:0xc00022cc60 externalBackends:map[huggingface-embeddings:/build/extra/grpc/huggingface/huggingface.py]}
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Loading GRPC Process%!(EXTRA string=/tmp/localai/backend_data/backend-assets/grpc/llama)
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC Service for ggml-gpt4all-j will be running at: '127.0.0.1:40139'
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC Service state dir: /tmp/go-processmanager1178696234
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC Service Started
local-ai-966dbbfff-khf5v local-ai rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40139: connect: connection refused"
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:40139): stderr 2023/07/27 10:59:08 gRPC Server listening at 127.0.0.1:40139
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC Service Ready
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:/models/ggml-gpt4all-j ContextSize:512 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:}
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:40139): stderr llama.cpp: loading model from /models/ggml-gpt4all-j
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:40139): stderr error loading model: unexpectedly reached end of file
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:40139): stderr llama_load_model_from_file: failed to load model
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG [llama] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG [gpt4all] Attempting to load
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Loading model gpt4all from ggml-gpt4all-j
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Loading model in memory from file: /models/ggml-gpt4all-j
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Loading GRPC Model gpt4all: {backendString:gpt4all modelFile:ggml-gpt4all-j threads:4 assetDir:/tmp/localai/backend_data context:0xc0000c4000 gRPCOptions:0xc00022cc60 externalBackends:map[huggingface-embeddings:/build/extra/grpc/huggingface/huggingface.py]}
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG Loading GRPC Process%!(EXTRA string=/tmp/localai/backend_data/backend-assets/grpc/gpt4all)
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC Service for ggml-gpt4all-j will be running at: '127.0.0.1:39245'
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC Service state dir: /tmp/go-processmanager3120951207
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC Service Started
local-ai-966dbbfff-khf5v local-ai rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39245: connect: connection refused"
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:39245): stderr 2023/07/27 10:59:17 gRPC Server listening at 127.0.0.1:39245
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC Service Ready
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:/models/ggml-gpt4all-j ContextSize:512 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all}
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:39245): stdout gptj_model_load: loading model from '/models/ggml-gpt4all-j' - please wait ...
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:39245): stdout gptj_model_load: n_vocab = 50400
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:39245): stdout gptj_model_load: n_ctx   = 2048
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:39245): stdout gptj_model_load: n_embd  = 4096
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:39245): stdout gptj_model_load: n_head  = 16
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:39245): stdout gptj_model_load: n_layer = 28
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:39245): stdout gptj_model_load: n_rot   = 64
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:39245): stdout gptj_model_load: f16     = 2
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:39245): stdout gptj_model_load: ggml ctx size = 5401.45 MB
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:39245): stdout gptj_model_load: kv self size  =  896.00 MB
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:39245): stdout gptj_model_load: ................................... done
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG GRPC(ggml-gpt4all-j-127.0.0.1:39245): stdout gptj_model_load: model size =  3609.38 MB / num tensors = 285
local-ai-966dbbfff-khf5v local-ai 10:59AM DBG [gpt4all] Loads OK

local-ai-966dbbfff-khf5v local-ai 11:01AM DBG Response: {"object":"chat.completion","model":"ggml-gpt4all-j","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":", which should be updated with new data on a regular basis.\nAs an AI language model, I do not have access to the specific details of your project or environment. However, in general terms:\n1) You should consider the following options for creating and managing your models:\n- Creating a separate repository for each model, which could be managed independently and updated on a regular basis.\n- Creating an API endpoint that exposes your models as a RESTful interface, which could be accessed by clients that need to download or update the models.\n- Using a database to store your model data, which could be updated with new models on a regular basis.\n- You could also consider using a cloud storage solution, such as AWS S3 or Google Cloud Storage to store and share your models with other applications or services.\n2) It's also important to consider the following factors when choosing a cloud provider:\n- Security and compliance requirements (e.g., HIPAA, GDPR)\n- Scalability and performance capabilities (e.g., availability, pricing)\n- Integration and collaboration capabilities (e.g., APIs, webhooks)\n- Disaster recovery and backup solutions (e.g., AWS Backup, Google Cloud Storage)\n- Support and documentation resources (e.g., AWS Knowledge Center, Google Cloud Platform documentation)\n- Cost and usage limits (e.g., cost-effective pricing, usage limits)\nUltimately, the best solution will depend on your specific project requirements and constraints."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

Additional context

eyalsofer commented 1 year ago

Hi, i'm experiencing the same - working also with latest version , installed with the helm chart. i can see the requests from k8sgpt, and local-ai response in local-ai log, but for some reason the results comes back empty:

DBG Prompt (before templating): Simplify the following Kubernetes error message delimited by triple dashes written in --- english --- language; --- Deployment WTUzQHs3JQ==/ezpCQDtvdWhk has 1 replicas but 2 are available ---. Error: {Explain error here} Simplify the following Kubernetes error message delimited by triple dashes written in --- english --- language; --- Deployment WTUzQHs3JQ==/ezpCQDtvdWhk has 1 replicas but 2 are available ---. Error: {Explain error here} Simplify the following Kubernetes error message delimited by triple dashes written in --- english --- language; --- Deployment WTUzQHs3JQ==/ezpCQDtvdWhk has 1 replicas but 2 are available ---. Error: {Explain error here} 10:42AM DBG Response: {"object":"chat.completion","model":"ggml-gpt4all-j","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":": #\nThe error message \"Deploymment has 1 replica but 2 are available\" indicates that there is a mismatch between the number of replicas and available replica instances. To resolve this, you can try the following steps:\n1. Check if there are any pending or failed deployments in the current namespace.\n2. Check if there are any replicas that have not been deployed or updated.\n3. Check if there are any replicas that have been deleted or moved to another namespace.\n4. Check if there are any replicas that have not been created or added to the deployment.\n5. If there are any pending or failed deployments, delete them and try deploying the deployment again.\n6. If there are any replicas that have not been deployed or updated, create a new replica and update it with the latest data.\n7. If there are any replicas that have been deleted or moved to another namespace, move them back and update the deployment.\n8. If there are any replicas that have not been created or added to the deployment, create a new replica and add it to the deployment.\n9. If there are any issues with the deployment, check if all the required resources are available and try again.\n10. If there are any issues with the deployment, check if all the required resources are available and try again.\n11. If there are any issues with the deployment, check if all the required resources are available and try again.\n12. If there are any issues with the deployment, check if all the required resources are available and try again.\n13. If there are any issues with the deployment, check if all the required resources are available and try again.\n14. If there are any issues with the deployment, check if all the required resources are available and try again.\n15. If there are any issues with the deployment, check if all the required resources are available and try again.\n16. If there are any issues with the deployment, check if all the required resources are available and try again.\n17. If there are any issues with the deployment, check if all the required resources are available and try again.\n18. If there are any issues with the deployment, check if all the required resources are available and try again.\n19. If there are any issues with the deployment, check if all the required resources are available and try again.\n20. If there are any issues with the deployment, check if all the required resources"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

and when i run the results command: kubectl get results -o json | jq . i get empty results:

{ "apiVersion": "v1", "items": [], "kind": "List", "metadata": { "resourceVersion": "" } }

mudler commented 1 year ago

This doesn't look an issue with LocalAI by itself - but rather model configuration and the model being used.

The actual reply from the LLM is there:

local-ai-966dbbfff-khf5v local-ai 11:01AM DBG Response: {"object":"chat.completion","model":"ggml-gpt4all-j","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":", which should be updated with new data on a regular basis.\nAs an AI language model, I do not have access to the specific details of your project or environment. However, in general terms:\n1) You should consider the following options for creating and managing your models:\n- Creating a separate repository for each model, which could be managed independently and updated on a regular basis.\n- Creating an API endpoint that exposes your models as a RESTful interface, which could be accessed by clients that need to download or update the models.\n- Using a database to store your model data, which could be updated with new models on a regular basis.\n- You could also consider using a cloud storage solution, such as AWS S3 or Google Cloud Storage to store and share your models with other applications or services.\n2) It's also important to consider the following factors when choosing a cloud provider:\n- Security and compliance requirements (e.g., HIPAA, GDPR)\n- Scalability and performance capabilities (e.g., availability, pricing)\n- Integration and collaboration capabilities (e.g., APIs, webhooks)\n- Disaster recovery and backup solutions (e.g., AWS Backup, Google Cloud Storage)\n- Support and documentation resources (e.g., AWS Knowledge Center, Google Cloud Platform documentation)\n- Cost and usage limits (e.g., cost-effective pricing, usage limits)\nUltimately, the best solution will depend on your specific project requirements and constraints."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

@AlexsJones did you changed k8sgpt lately to ask to return data in a specific format from the LLM? It's the first time I see this:

local-ai-966dbbfff-khf5v local-ai Simplify the following Kubernetes error message delimited by triple dashes written in --- english --- language; --- Service has no endpoints, expected label app.kubernetes.io/component=webhook Service has no endpoints, expected label app.kubernetes.io/instance=cert-manager Service has no endpoints, expected label app.kubernetes.io/name=webhook ---.
local-ai-966dbbfff-khf5v local-ai   Provide the most possible solution in a step by step style in no more than 280 characters. Write the output in the following format:
local-ai-966dbbfff-khf5v local-ai   Error: {Explain error here}
local-ai-966dbbfff-khf5v local-ai   Solution: {Step by step solution here}

A note: this probably will break most of the small LLM models, which are not really good at formatting output

AlexsJones commented 1 year ago

We did evolve to structure the response which works well on all the hosted API. To eliminate that I have removed it locally and am using this:

    simple_prompt = "Simplify the following Kubernetes error message written in --- %s --- language; --- %s ---. Provide the most possible solution in a step by step style in no more than 280 characters"

I can see localai is generating responses in the pod, but it seems to not come back to my request

In the pod for localai

local-ai-966dbbfff-khf5v local-ai 12:17PM DBG Request received: {"model":"ggml-gpt4all-j","language":"","n":0,"top_p":0,"top_k":0,"temperature":0,"max_tokens":0,"echo":false,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"frequency_penalty":0,"tfz":0,"typical_p":0,"seed":0,"file":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"Simplify the following Kubernetes error message written in --- english --- language; --- Service has no endpoints, expected label app.kubernetes.io/instance=cert-manager Service has no endpoints, expected label app.kubernetes.io/name=webhook Service has no endpoints, expected label app.kubernetes.io/component=webhook ---. Provide the most possible solution in a step by step style in no more than 280 characters"}],"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null}
local-ai-966dbbfff-khf5v local-ai 12:17PM DBG Configuration read: &{PredictionOptions:{Model:ggml-gpt4all-j Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 F16:false NUMA:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false Grammar: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} SystemPrompt:}
local-ai-966dbbfff-khf5v local-ai 12:17PM DBG Parameters: &{PredictionOptions:{Model:ggml-gpt4all-j Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 F16:false NUMA:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false Grammar: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} SystemPrompt:}
local-ai-966dbbfff-khf5v local-ai 12:17PM DBG Prompt (before templating): Simplify the following Kubernetes error message written in --- english --- language; --- Service has no endpoints, expected label app.kubernetes.io/instance=cert-manager Service has no endpoints, expected label app.kubernetes.io/name=webhook Service has no endpoints, expected label app.kubernetes.io/component=webhook ---. Provide the most possible solution in a step by step style in no more than 280 characters
local-ai-966dbbfff-khf5v local-ai 12:17PM DBG Template found, input modified to: The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.
local-ai-966dbbfff-khf5v local-ai ### Prompt:
local-ai-966dbbfff-khf5v local-ai Simplify the following Kubernetes error message written in --- english --- language; --- Service has no endpoints, expected label app.kubernetes.io/instance=cert-manager Service has no endpoints, expected label app.kubernetes.io/name=webhook Service has no endpoints, expected label app.kubernetes.io/component=webhook ---. Provide the most possible solution in a step by step style in no more than 280 characters
local-ai-966dbbfff-khf5v local-ai ### Response:# Models to download at runtime
local-ai-966dbbfff-khf5v local-ai 12:17PM DBG Prompt (after templating): The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.
local-ai-966dbbfff-khf5v local-ai ### Prompt:
local-ai-966dbbfff-khf5v local-ai Simplify the following Kubernetes error message written in --- english --- language; --- Service has no endpoints, expected label app.kubernetes.io/instance=cert-manager Service has no endpoints, expected label app.kubernetes.io/name=webhook Service has no endpoints, expected label app.kubernetes.io/component=webhook ---. Provide the most possible solution in a step by step style in no more than 280 characters
local-ai-966dbbfff-khf5v local-ai ### Response:# Models to download at runtime
local-ai-966dbbfff-khf5v local-ai 12:17PM DBG Model already loaded in memory: ggml-gpt4all-j
local-ai-966dbbfff-khf5v local-ai 12:17PM DBG Model 'ggml-gpt4all-j' already loaded
local-ai-966dbbfff-khf5v local-ai 12:18PM DBG Response: {"object":"chat.completion","model":"ggml-gpt4all-j","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"for Kuberentes endpoints\nTo solve the error message, you need to add endpoints for the Kubernetes service in your app.yaml file or specify the endpoints in your client configuration. \nYou can use the `--server` option to specify a Kubernetes API server URL and use it in your client configuration. \nYou can also use the `--server-config` option to specify a configuration file that contains the Kubernetes API server URL and other configuration options. \nIn addition, you need to make sure that the Kubernetes API server is running and accessible from your application."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
local-ai-966dbbfff-khf5v local-ai [127.0.0.1]:50378  200  -  POST     /v1/chat/completions

Locally k8sgpt ( via port forward )

    ~/Code/k8sgpt    main ⇣1  go run ./main.go analyze --no-cache=true --explain -b localai
   0% |                                                                | (0/4, 0 it/hr) [0s:0s]
Error: failed while calling AI provider localai: Post "http://localhost:8080/v1/chat/completions": EOF
exit status 1
AlexsJones commented 1 year ago

I will try with curl also to validate

eyalsofer commented 1 year ago

@mudler - can you verify the preload_models and promptTemplates config are correct?

preload_models: '[{ "url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "overrides": { "parameters": { "model": "ggml-gpt4all-j" }}, "files": [ { "uri": "https://gpt4all.io/models/ggml-gpt4all-j.bin", "sha256": "acd54f6da1cad7c04c48b785178d686c720dcbe549903032a0945f97b1a43d20", "filename": "ggml-gpt4all-j" }]}]'

promptTemplates: ggml-gpt4all-j.tmpl: | The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.

Prompt:

{{.Input}}

Response:# Models to download at runtime

mudler commented 1 year ago

Response:# Models to download at runtime

### Response:# Models to download at runtime

doesn't look right

mudler commented 1 year ago

I will try with curl also to validate

can you try curling from the localai pod? maybe the proxy is getting in between?

AlexsJones commented 1 year ago

I will try with curl also to validate

can you try curling from the localai pod? maybe the proxy is getting in between?

root@local-ai-966dbbfff-khf5v:/build# curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j.bin",            
     "prompt": "A long time ago in a galaxy far, far away",
     "temperature": 0.7
   }'

{"object":"text_completion","model":"ggml-gpt4all-j.bin","choices":[{"index":0,"finish_reason":"stop","text":"…\nThere was an alien race known as the X-Men. They were a group of mutants who had the ability to control and manipulate their powers. They were often feared by humans, but they also had a strong sense of community and were known for their kind hearts."}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}root@local-ai-966dbbfff-khf5v:/build# 
root@local-ai-966dbbfff-khf5v:/build# 

So this is good, we can curl inside the pod and it works - however, proxy or via k8sgpt externally we see it hangs. 🤔

mudler commented 1 year ago

I will try with curl also to validate

can you try curling from the localai pod? maybe the proxy is getting in between?

root@local-ai-966dbbfff-khf5v:/build# curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j.bin",            
     "prompt": "A long time ago in a galaxy far, far away",
     "temperature": 0.7
   }'

{"object":"text_completion","model":"ggml-gpt4all-j.bin","choices":[{"index":0,"finish_reason":"stop","text":"…\nThere was an alien race known as the X-Men. They were a group of mutants who had the ability to control and manipulate their powers. They were often feared by humans, but they also had a strong sense of community and were known for their kind hearts."}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}root@local-ai-966dbbfff-khf5v:/build# 
root@local-ai-966dbbfff-khf5v:/build# 

So this is good, we can curl inside the pod and it works - however, proxy or via k8sgpt externally we see it hangs. thinking

when trying from outside, add stream: true so you can see it live:

 curl <endpoint>/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j.bin",            
     "stream": true,
     "prompt": "A long time ago in a galaxy far, far away",
     "temperature": 0.7
   }'

In this way we can tell if it's just slow to respond to the full answer, or there are issues reaching out the service

AlexsJones commented 1 year ago

Thanks I will give it a go in tomorrow This blocks me rn


│ local-ai 7:48PM ERR error: Get "https://raw.githubusercontent.com/go-skynet/model-gallery/ma │
│ in/gpt4all-j.yaml": dial tcp: lookup raw.githubusercontent.com: i/o timeout     
``` 💀