k8sgpt-ai / k8sgpt-operator

Automatic SRE Superpowers within your Kubernetes cluster
https://k8sgpt.ai
Apache License 2.0
294 stars 80 forks source link

[Question]: k8sgpt-operator fails with and local-ai #233

Open rwlove opened 11 months ago

rwlove commented 11 months ago

Checklist

Affected Components

K8sGPT Version

0.0.21

Kubernetes Version

v1.28.1

Host OS and its Version

CentOS Stream 9

Steps to reproduce

I'm running a homelab and I'm trying to use k8sgpt with local-ai because k8sgpt doesn't work with the API rate limiting imposed on my developer license for OpenAI.

local-ai is running and I can query it from within the cluster

tmp-shell  ~  curl http://localai-local-ai.ai.svc.cluster.local/v1/models
{"object":"list","data":[{"id":"ggml-gpt4all-j.bin","object":"model"}]}

here is my config

Name:         k8sgpt
Namespace:    ai
Labels:       kustomize.toolkit.fluxcd.io/name=ai-k8sgpt-config
              kustomize.toolkit.fluxcd.io/namespace=flux-system
Annotations:  <none>
API Version:  core.k8sgpt.ai/v1alpha1
Kind:         K8sGPT
Metadata:
  Creation Timestamp:  2023-10-02T20:53:27Z
  Finalizers:
    k8sgpt.ai/finalizer
  Generation:        2
  Resource Version:  16419139
  UID:               6ff8f454-3562-4e27-a8cd-a88093322cc9
Spec:
  Ai:
    Anonymized:  true
    Backend:     localai
    Base URL:    http://localai-local-ai.ai.svc.cluster.local/v1
    Enabled:     true
    Language:    english
    Model:       ggml-gpt4all-j.bin
  Version:       v0.2.7
Events:          <none>

however, k8sgpt fails with the following error:

Creating new client for 10.43.135.122:8080
Connection established between 10.43.135.122:8080 and localhost with time out of 1 seconds.
Remote Address : 10.43.135.122:8080 
K8sGPT address: 10.43.135.122:8080
Finished Reconciling k8sGPT with error: failed to call Analyze RPC: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: http2: frame too large"
2023-10-02T20:59:25Z    ERROR   Reconciler error    {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt","namespace":"ai"}, "namespace": "ai", "name": "k8sgpt", "reconcileID": "246ab508-eaeb-431a-b617-0ab5676b591b", "error": "failed to call Analyze RPC: rpc error: code = Unavailable desc = connection error: desc = \"error reading server preface: http2: frame too large\""}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226

Expected behaviour

k8sgpt connects with local-ai without error

Actual behaviour

errors seem to suggest a communications problem between k8sgpt and localai.

Additional Information

Mix of bare metal and VM nodes.

Local AI and k8sgpt installed via Helm charts and Flux GitOps.

rwlove commented 11 months ago

I guess I got past the above error and I'm now seeing this error:

Finished Reconciling k8sGPT with error: failed to call Analyze RPC: rpc error: code = Unknown desc = failed while calling AI provider localai: Post "http://localai-local-ai.ai.svc.cluster.local/v1/chat/completions": dial tcp 10.43.182.197:80: connect: operation not permitted
2023-10-02T21:50:42Z    ERROR   Reconciler error    {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt","namespace":"ai"}, "namespace": "ai", "name": "k8sgpt", "reconcileID": "1b86f5c9-8a68-419e-aa7f-f32a5722c808", "error": "failed to call Analyze RPC: rpc error: code = Unknown desc = failed while calling AI provider localai: Post \"http://localai-local-ai.ai.svc.cluster.local/v1/chat/completions\": dial tcp 10.43.182.197:80: connect: operation not permitted"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226
{"level":"info","ts":1696283409.5358725,"caller":"server/log.go:50","msg":"request failed. failed while calling AI provider localai: Post \"http://localai-local-ai.ai.svc.cluster.local/v1/chat/completions\": EOF","duration_ms":36806,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"localai\" explain:true anonymize:true language:\"english\" max_concurrency:10 output:\"json\"","remote_addr":"10.42.7.73:59702","status_code":2}

and this configuration is causing localai to crash loop:

9:51PM INF LocalAI version: v1.30.0 (274ace289823a8bacb7b4987b5c961b62d5eee99)

 ┌───────────────────────────────────────────────────┐ 
 │                   Fiber v2.49.2                   │ 
 │               http://127.0.0.1:8080               │ 
 │       (bound on host 0.0.0.0 and port 8080)       │ 
 │                                                   │ 
 │ Handlers ............ 70  Processes ........... 1 │ 
 │ Prefork ....... Disabled  PID ................. 9 │ 
 └───────────────────────────────────────────────────┘ 

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41539: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35607: connect: connection refused"
arbreezy commented 11 months ago

@rwlove are you sure you can connect from k8sgpt's namespace to your localai's instance ? it looks like a networking issue

rwlove commented 11 months ago

@arbreezy I can run some tests to make sure it's working, but I'm pretty sure my networking is solid.

I started a thread in the k8sgpt slack/discord and they sent me to the LocalAI slack/discord and a nice person was trying to help me, but also could not get it to work.

I'm worried that I'm running into this unresolved issue: https://github.com/go-skynet/LocalAI/issues/771

I'm following this guidance pretty closely, along with a few blogs, but no luck: https://github.com/go-skynet/LocalAI/blob/master/examples/k8sgpt/README.md

rwlove commented 11 months ago

Maybe a bit off topic, but the main reason that I'm trying to use LocalAI is that k8sgpt fails with OpenAI. It tells me that OpenAI is restricting my API calls to 3 per minute, IIRC. I'm just a homelabber with a free OpenAI account. Is there a reason that k8sgpt can't just rate-limit itself? There's probably not urgency to make a ton of API calls immediately.

arbreezy commented 11 months ago

@rwlove I will test again local-ai integration, and write some feedback when I find more spare time.

Re: rate limiting, we can definitely be more clever with requests to ai backends but in the meantime just toggle the explain:true flag to stop incurring openai costs.

Not sure I get the argument around urgency as it's probably subjective but it should be configurable nevertheless

rwlove commented 11 months ago

@arbreezy Where do I set explain: true?

arbreezy commented 10 months ago

ah apologies @rwlove in the operator's CR is the following:

....
ai:
    enabled: false
...