k8sgpt-ai / k8sgpt

Giving Kubernetes Superpowers to everyone
http://k8sgpt.ai
Apache License 2.0
5.84k stars 679 forks source link

k8sgpt analyse --explains fails when using localai (401) #981

Closed ronaldpetty closed 7 months ago

ronaldpetty commented 8 months ago

Checklist

Affected Components

K8sGPT Version

0.3.24

Kubernetes Version

v1.26.9

Host OS and its Version

Ubuntu 22.04

Steps to reproduce

Install k8sgpt-operator and localai (sorry can add steps, but I had so issues and its not trivial to do).

~$ k8sgpt analyse -b localai works but adding --explain has auth issue.

~$ k8sgpt analyse -b localai 
AI Provider: localai

0 argocd/argocd-application-controller(argocd-application-controller)
- Error: StatefulSet uses the service argocd/argocd-application-controller which does not exist.

1 team-1/test3(test3)
- Error: StatefulSet uses the service team-1/ which does not exist.

2 default/broken-pod(broken-pod)
- Error: Back-off pulling image "nginx:1.a.b.c"

~$ k8sgpt analyse -b localai --explain
   0% |                                                                                                       | (0/3, 0 it/hr) [0s:0s]
Error: failed while calling AI provider localai: error, status code: 401, message: You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.

Expected behaviour

I assume --explain gives more information (I am new here, so not certain, but didn't expect error).

Actual behaviour

Error: failed while calling AI provider localai: error, status code: 401, message: You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.

Additional Information

No response

VaibhavMalik4187 commented 8 months ago

The --explain flag uses the AI provider configured by the user to run an analysis by communicating with the AI through the API keys. Localai doesn't use any API keys or AI and this is why you're facing this error. However, I think that the output could be improved to help the user understand the real problem here.

sysnet4admin commented 7 months ago

@ronaldpetty FYI If model is proper working on

Here is the result... Even though it could not support in Korean :)


root@cp-k8s:~# k8sgpt analyze --explain -b localai -l Korean
 100% |███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (2/2, 52 it/hr)           
AI Provider: localai

0 default/llama-2-7b-chat(llama-2-7b-chat)
- Error: Service has not ready endpoints, pods: [Pod/llama-2-7b-chat-55f89d58df-bwlzt], expected 1
 Error: Service has not ready endpoints, pods: [Pod/llama-2-7b-chat-55f89d58df-bwlzt], expected 1.

Solution:

1. Check if the service is running by running `kubectl get svc llama-2-7b-chat`
2. If the service is not running, run `kubectl start sfllama-2-7b-chat` to start it.
3. Wait for the service to be ready by running `kubectl wait --for=service=llama-2-7b-chat-ready`
4. Once the service is ready, check the endpoints by running `kubectl get ep llama-2-7b-chat`
1 default/llama-2-7b-chat-55f89d58df-bwlzt(Deployment/llama-2-7b-chat)
- Error: failed to generate container "288eac52810fb2072ef983c70dd57bba05cdf181524a92ff51e5ca55d4da8084" spec: failed to generate spec: failed to stat "/var/lib/kubelet/pods/60c6876c-2a05-4499-92e8-5f326a9e046e/volumes/kubernetes.io~nfs/pvc-d82ae098-9e08-4dc7-90cf-09e4b9dd79cd": stat /var/lib/kubelet/pods/60c6876c-2a05-4499-92e8-5f326a9e046e/volumes/kubernetes.io~nfs/pvc-d82ae098-9e08-4dc7-90cf-09e4b9dd79cd: stale NFS file handle
 Error: failed to stat "/var/lib/kubelet/pods/60c6876c-2a05-4499-92e8-5f326a9e046e/volumes/kubernetes.io~nfs/pvc-d82ae098-9e08-4dc7-90cf-09e4b9dd79cd": stale NFS file handle
Solution:
1. Check and update the Kubernetes version on your cluster.
2. Run `kubectl cleanup` to remove any stale state left over from previous runs.
3. Restart the kubelet to refresh the file handles.
4. Try running the `kubectl create` command again.
ronaldpetty commented 7 months ago

@sysnet4admin I am very new to this. Per @VaibhavMalik4187 it sounds like this would never work (unless local-ai started accepting tokens). Can you share how you set is done? I used the k8sgpt operator and localai.

sysnet4admin commented 7 months ago

@ronaldpetty I use just binary k8sgpt + ollama(https://github.com/ollama/ollama) Hopefully it would understand further.

sysnet4admin commented 7 months ago

You could leverage this one for loading models in local env easily. (text-generation-webui)

JuHyung-Son commented 7 months ago

how did you set your localai?

liyimeng commented 7 months ago

@ronaldpetty I use just binary k8sgpt + ollama(https://github.com/ollama/ollama) Hopefully it would understand further.

@sysnet4admin may you kindly share your backend configuration. I try to use ollama, but could not find any docs on how to set it up. Thanks a lot!

sysnet4admin commented 7 months ago

@JuHyung-Son @liyimeng https://yozm.wishket.com/magazine/detail/2516/ (writer is me @sysnet4admin )

Hopefully it will help to understand for setup & moving further. (unfortunately this article wrote in Korean, but content may understand due to command is in EN).

liyimeng commented 7 months ago

Thanks for instantly response! @sysnet4admin I will give it a shot!

ronaldpetty commented 7 months ago

Thank you @sysnet4admin . I think I am starting to understand based on your tutorial. I was using this tutorial originally:

https://itnext.io/k8sgpt-localai-unlock-kubernetes-superpowers-for-free-584790de9b65

I believe these issues were causing me trouble.

Once I updated the cluster service port, the errors went away. However, I am still confused on basic operations of k8sgpt.

~$ k8sgpt analyze 
AI Provider: AI not used; --explain not set

0 default/broken-pod(broken-pod)
- Error: Back-off pulling image "nginx:1.a.b.c"

What does that even mean? (AI Provider not used?) Does that mean k8sgpt still does analyse even if there are no backends?

Another thing I noticed. If you have no k8sgpt resource, nothing happens (e.g. submit broken pod and never get a result). If you do have a k8sgpt, even with a broken baseURL (like I had), it still creates a result. Again, that is confusing, it implies no AI is needed. Maybe it says that somewhere but I must be missing it.

I think my issue title might need to be changed to local-ai and localai.

Final thing, I installed the operator via helm. The logs look as such:

...
Finished Reconciling k8sGPT
Creating new client for 10.43.58.77:8080
Connection established between 10.43.58.77:8080 and localhost with time out of 1 seconds.
Remote Address : 10.43.58.77:8080 
K8sGPT address: 10.43.58.77:8080
Checking if defaultbrokenpod is still relevant

It would be helpful if the operator logged who it was talking to (maybe its not talking to anyone, but then I don't understand this configuration:

~$ cat k8sgpt-localai.yaml 
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-local-ai
  namespace: default
spec:
  ai:
    enabled: true
    model: ggml-gpt4all-j
    backend: localai
    baseUrl: http://local-ai.default.svc.cluster.local:80/v1
  noCache: false
  repository: ghcr.io/k8sgpt-ai/k8sgpt
  version: v0.3.8

Here are the svc's I have.

~$ kubectl get svc
NAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
kubernetes        ClusterIP   10.43.0.1      <none>        443/TCP    156m
local-ai          ClusterIP   10.43.90.122   <none>        80/TCP     122m
k8sgpt-local-ai   ClusterIP   10.43.58.77    <none>        8080/TCP   19m
ronaldpetty commented 7 months ago

I made some more progress. I got k8sgpt CLI tool working with OpenAI. I went back to LocalAI, and made a little more progress. I tried the k8sgpt CLI in a Pod, since I installed LocalAI there. It seems that I am misusing LocalAI.

root@testing:/# k8sgpt auth add --backend localai --model tinyllama --baseurl http://local-ai.default.svc.cluster.local:80/v1
localai added to the AI backend provider list
root@testing:/# k8sgpt analyze --explain -b localai --kubeconfig=.kube/config
   0% |                                                                                                                                                                                                                            | (0/1, 0 it/hr) [0s:0s]
Error: failed while calling AI provider localai: error, status code: 500, message: could not load model - all backends returned error: 23 errors occurred:
    * could not load model: rpc error: code = Canceled desc = 
    * could not load model: rpc error: code = Unknown desc = failed loading model
    * could not load model: rpc error: code = Unknown desc = failed loading model
    * could not load model: rpc error: code = Unknown desc = failed loading model
    * could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
    * could not load model: rpc error: code = Unknown desc = stat /models/add: no such file or directory
    * could not load model: rpc error: code = Unknown desc = stat /models/add: no such file or directory
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/tinydream. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * could not load model: rpc error: code = Unknown desc = unsupported model type /models/add (should end with .onnx)
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vllm/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers-musicgen/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/autogptq/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/coqui/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama2/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/bark/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/mamba/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/diffusers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vall-e-x/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/petals/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
    * grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS

Below are the logs from LocalAI. I at least know its hitting it, but seems to be failing due to some configuration issue. I guess maybe will close this ticket and focus on LocalAI.

4:17AM INF Trying to load the model 'add' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper, /build/backend/python/transformers/run.sh, /build/backend/python/vllm/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/coqui/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/bark/run.sh, /build/backend/python/mamba/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/vall-e-x/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/exllama/run.sh
4:17AM INF [llama-cpp] Attempting to load
4:17AM INF Loading model 'add' with backend llama-cpp
4:17AM INF [llama-cpp] Fails: could not load model: rpc error: code = Canceled desc = 
4:17AM INF [llama-ggml] Attempting to load
4:17AM INF Loading model 'add' with backend llama-ggml
4:17AM INF [llama-ggml] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
4:17AM INF [gpt4all] Attempting to load
4:17AM INF Loading model 'add' with backend gpt4all
4:17AM INF [gpt4all] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
4:17AM INF [bert-embeddings] Attempting to load
4:17AM INF Loading model 'add' with backend bert-embeddings
4:17AM INF [bert-embeddings] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
4:17AM INF [rwkv] Attempting to load
4:17AM INF Loading model 'add' with backend rwkv
4:17AM INF [rwkv] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
4:17AM INF [whisper] Attempting to load
4:17AM INF Loading model 'add' with backend whisper
4:17AM INF [whisper] Fails: could not load model: rpc error: code = Unknown desc = stat /models/add: no such file or directory
4:17AM INF [stablediffusion] Attempting to load
4:17AM INF Loading model 'add' with backend stablediffusion
4:17AM INF [stablediffusion] Fails: could not load model: rpc error: code = Unknown desc = stat /models/add: no such file or directory
4:17AM INF [tinydream] Attempting to load
4:17AM INF Loading model 'add' with backend tinydream
4:17AM INF [tinydream] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/tinydream. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [piper] Attempting to load
4:17AM INF Loading model 'add' with backend piper
4:17AM INF [piper] Fails: could not load model: rpc error: code = Unknown desc = unsupported model type /models/add (should end with .onnx)
4:17AM INF [/build/backend/python/transformers/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/transformers/run.sh
4:17AM INF [/build/backend/python/transformers/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [/build/backend/python/vllm/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/vllm/run.sh
4:17AM INF [/build/backend/python/vllm/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vllm/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [/build/backend/python/transformers-musicgen/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/transformers-musicgen/run.sh
4:17AM INF [/build/backend/python/transformers-musicgen/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers-musicgen/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [/build/backend/python/sentencetransformers/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/sentencetransformers/run.sh
4:17AM INF [/build/backend/python/sentencetransformers/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [/build/backend/python/autogptq/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/autogptq/run.sh
4:17AM INF [/build/backend/python/autogptq/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/autogptq/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [/build/backend/python/sentencetransformers/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/sentencetransformers/run.sh
4:17AM INF [/build/backend/python/sentencetransformers/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [/build/backend/python/coqui/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/coqui/run.sh
4:17AM INF [/build/backend/python/coqui/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/coqui/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [/build/backend/python/exllama2/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/exllama2/run.sh
4:17AM INF [/build/backend/python/exllama2/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama2/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [/build/backend/python/bark/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/bark/run.sh
4:17AM INF [/build/backend/python/bark/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/bark/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [/build/backend/python/mamba/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/mamba/run.sh
4:17AM INF [/build/backend/python/mamba/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/mamba/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [/build/backend/python/diffusers/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/diffusers/run.sh
4:17AM INF [/build/backend/python/diffusers/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/diffusers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [/build/backend/python/vall-e-x/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/vall-e-x/run.sh
4:17AM INF [/build/backend/python/vall-e-x/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vall-e-x/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [/build/backend/python/petals/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/petals/run.sh
4:17AM INF [/build/backend/python/petals/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/petals/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
4:17AM INF [/build/backend/python/exllama/run.sh] Attempting to load
4:17AM INF Loading model 'add' with backend /build/backend/python/exllama/run.sh
4:17AM INF [/build/backend/python/exllama/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
gyliu513 commented 1 month ago

@ronaldpetty are you using vllm for your local ai test? Thanks

ronaldpetty commented 1 month ago

Hi @gyliu513 Its been a long time since this ticket. I am not using VLLM lately. I will try again and if I get that working will update.

gyliu513 commented 1 month ago

@ronaldpetty no worries, I got a blog for k8sgpt and vllm https://medium.com/@panpan0000/empower-kubernetes-with-k8sgpt-using-open-source-llm-1b3fa021abd6 , thanks!