k8sgpt-ai / k8sgpt-operator

Automatic SRE Superpowers within your Kubernetes cluster
https://k8sgpt.ai
Apache License 2.0
321 stars 92 forks source link

[Question]: I have two questions regarding its usage. May I ask if they can be resolved? #473

Open yangy30 opened 4 months ago

yangy30 commented 4 months ago

Checklist

Affected Components

K8sGPT Version

v0.1.6

Kubernetes Version

v1.28.11

Host OS and its Version

Rocky Linux 8.10

Steps to reproduce

1. When the error persists, the results are occasionally empty.

# k get results -n monitoring  
NAME                                     KIND   BACKEND
defaultnginxdeployment26b7b6f9774b4wng   Pod    openai
# k get pod
NAME                                 READY   STATUS             RESTARTS   AGE
nginx-deployment2-6b7b6f9774-b4wng   0/1     ImagePullBackOff   0          2m21s
# k get results -n monitoring  
No resources found in monitoring namespace.
# k get results -n monitoring  
NAME                                     KIND   BACKEND
defaultnginxdeployment26b7b6f9774b4wng   Pod    openai

2. K8sGPT will print error logs, but it does not affect usage.

Created result defaultnginxdeployment56f9d4488hx589
Finished Reconciling k8sGPT
Creating new client for 10.108.197.164:8080
Connection established between 10.108.197.164:8080 and localhost with time out of 1 seconds.
Remote Address : 10.108.197.164:8080 
K8sGPT address: 10.108.197.164:8080
Checking if defaultnginxdeployment56f9d4488hx589 is still relevant
Finished Reconciling k8sGPT with error: Operation cannot be fulfilled on results.core.k8sgpt.ai "defaultnginxdeployment56f9d4488hx589": the object has been modified; please apply your changes to the latest version and try again
2024-07-16T19:49:06Z    ERROR   Reconciler error    {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"monitoring"}, "namespace": "monitoring", "name": "k8sgpt-sample", "reconcileID": "2b6fe54e-750e-4731-a364-d689a2665448", "error": "Operation cannot be fulfilled on results.core.k8sgpt.ai \"defaultnginxdeployment56f9d4488hx589\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226
Creating new client for 10.108.197.164:8080
Connection established between 10.108.197.164:8080 and localhost with time out of 1 seconds.
Remote Address : 10.108.197.164:8080 
K8sGPT address: 10.108.197.164:8080
Checking if defaultnginxdeployment56f9d4488hx589 is still relevant
Finished Reconciling k8sGPT

Expected behaviour

  1. Is it possible to stably display results when errors exist?
  2. How can I eliminate the error logs from K8sGPT?

Actual behaviour

No response

Additional Information

Configuration

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-sample
namespace: monitoring
spec:
ai:
enabled: true
model: gpt-3.5-turbo
backend: openai
baseUrl: https://api.chatanywhere.tech
secret:
name: k8sgpt-sample-secret
key: openai-api-key
language: chinese
noCache: false
repository: ghcr.io/k8sgpt-ai/k8sgpt
version: v0.3.8
EOF
arbreezy commented 3 months ago

Hey @yangy30 , thanks for raising this.

I will also try to reproduce as it seems that we can handle the lifecycle of the result object better.

By the looks of it, it seems that the operator is updating the result object with an old revision or object number and then the operation is getting retried successfully in the next reconciliation loop.

I am still unsure how the result spec can be empty if the operation is not successful though.

I am wondering if you see any issues in the k8sgpt pod logs. The k8sgpt pod will make the inference call to the your AI backend which if it fails it might get an empty response back and write it to the results object.