k8sgpt-ai / k8sgpt-operator

Automatic SRE Superpowers within your Kubernetes cluster
https://k8sgpt.ai
Apache License 2.0
276 stars 73 forks source link

[Bug Report]: Empty JSON results after installation journey in README.md #446

Open sjflausino opened 1 month ago

sjflausino commented 1 month ago

Checklist

Affected Components

K8sGPT Version

v0.3.8

Kubernetes Version

v1.28.9-eks-036c24b

Host OS and its Version

No response

Steps to reproduce

helm repo add k8sgpt https://charts.k8sgpt.ai/
helm repo update
helm install release k8sgpt/k8sgpt-operator -n k8sgpt-operator-system --create-namespace
kubectl create secret generic k8sgpt-sample-secret --from-literal=openai-api-key=$OPENAI_TOKEN -n k8sgpt-operator-system
kubectl apply -f - << EOF
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
  namespace: k8sgpt-operator-system
spec:
  ai:
    enabled: true
    model: gpt-3.5-turbo
    backend: openai
    secret:
      name: k8sgpt-sample-secret
      key: openai-api-key
    # anonymized: false
    # language: english
  noCache: false
  repository: ghcr.io/k8sgpt-ai/k8sgpt
  version: v0.3.8
  #integrations:
  # trivy:
  #  enabled: true
  #  namespace: trivy-system
  # filters:
  #   - Ingress
  # sink:
  #   type: slack
  #   webhook: <webhook-url> # use the sink secret if you want to keep your webhook url private
  #   secret:
  #     name: slack-webhook
  #     key: url
  #extraOptions:
  #   backstage:
  #     enabled: true
EOF
kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
  name: broken-pod
  namespace: default
spec:
  containers:
    - name: broken-pod
      image: nginx:1.a.b.c
      livenessProbe:
        httpGet:
          path: /
          port: 81
        initialDelaySeconds: 3
        periodSeconds: 3
EOF

Expected behaviour

❯ kubectl get results -o json | jq .
{
  "apiVersion": "v1",
  "items": [
    {
      "apiVersion": "core.k8sgpt.ai/v1alpha1",
      "kind": "Result",
      "spec": {
        "details": "The error message means that the service in Kubernetes doesn't have any associated endpoints, which should have been labeled with \"control-plane=controller-manager\". \n\nTo solve this issue, you need to add the \"control-plane=controller-manager\" label to the endpoint that matches the service. Once the endpoint is labeled correctly, Kubernetes can associate it with the service, and the error should be resolved.",

Actual behaviour

kubectl get results -o json | jq .
{
  "apiVersion": "v1",
  "items": [],
  "kind": "List",
  "metadata": {
    "resourceVersion": ""
  }
}

Additional Information

I execute the installation of the chart, create the secret with OPENAI_TOKEN declared and apply the K8sGPT CR, create a broken-pod as in the CLI example and when running the command to generate the CR results I receive a blank Json, when querying the results CR in the cluster through kubectl get results -A I receive an empty list. I configured the CLI with the same OPENAI_TOKEN and when I run k8sgpt analyze --explain I get the correct response, apparently no log or warn is displayed indicating any configuration problem with the operator pods, I would like to know what would be the next step to be able to perform the installation of the k8sgpt operato, I am available for any clarification! Below is a describe of the deployment applied by helm:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2024-05-15T21:20:55Z"
  generation: 1
  name: k8sgpt-sample
  namespace: k8sgpt-operator-system
  ownerReferences:
  - apiVersion: core.k8sgpt.ai/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: K8sGPT
    name: k8sgpt-sample
    uid: bc7fdb29-1559-4afb-a5e5-7cd6c8599781
  resourceVersion: "76685"
  uid: 0a972892-8b4b-4c8b-9e85-f20fa0bb0e27
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: k8sgpt-sample
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: k8sgpt-sample
    spec:
      containers:
      - args:
        - serve
        env:
        - name: K8SGPT_MODEL
          value: gpt-3.5-turbo
        - name: K8SGPT_BACKEND
          value: openai
        - name: XDG_CONFIG_HOME
          value: /k8sgpt-data/.config
        - name: XDG_CACHE_HOME
          value: /k8sgpt-data/.cache
        - name: K8SGPT_PASSWORD
          valueFrom:
            secretKeyRef:
              key: openai-api-key
              name: k8sgpt-sample-secret
        image: ghcr.io/k8sgpt-ai/k8sgpt:v0.3.8
        imagePullPolicy: Always
        name: k8sgpt
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          limits:
            cpu: "1"
            memory: 512Mi
          requests:
            cpu: 200m
            memory: 156Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /k8sgpt-data
          name: k8sgpt-vol
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: k8sgpt
      serviceAccountName: k8sgpt
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir: {}
        name: k8sgpt-vol
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2024-05-15T21:20:58Z"
    lastUpdateTime: "2024-05-15T21:20:58Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2024-05-15T21:20:55Z"
    lastUpdateTime: "2024-05-15T21:20:58Z"
    message: ReplicaSet "k8sgpt-sample-579988657d" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1
---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: release
    meta.helm.sh/release-namespace: k8sgpt-operator-system
  creationTimestamp: "2024-05-15T21:20:30Z"
  generation: 1
  labels:
    app.kubernetes.io/component: manager
    app.kubernetes.io/created-by: k8sgpt-operator
    app.kubernetes.io/instance: release
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: k8sgpt-operator
    app.kubernetes.io/part-of: k8sgpt-operator
    app.kubernetes.io/version: 0.0.26
    control-plane: controller-manager
    helm.sh/chart: k8sgpt-operator-0.1.4
  name: release-k8sgpt-operator-controller-manager
  namespace: k8sgpt-operator-system
  resourceVersion: "76590"
  uid: 56845cb5-480a-47e9-aba5-7a2e62d768f2
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: release
      app.kubernetes.io/name: k8sgpt-operator
      control-plane: controller-manager
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/default-container: manager
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: release
        app.kubernetes.io/name: k8sgpt-operator
        control-plane: controller-manager
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                - amd64
                - arm64
                - ppc64le
                - s390x
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
      containers:
      - args:
        - --secure-listen-address=0.0.0.0:8443
        - --upstream=http://127.0.0.1:8080/
        - --logtostderr=true
        - --v=0
        env:
        - name: KUBERNETES_CLUSTER_DOMAIN
          value: cluster.local
        image: gcr.io/kubebuilder/kube-rbac-proxy:v0.16.0
        imagePullPolicy: IfNotPresent
        name: kube-rbac-proxy
        ports:
        - containerPort: 8443
          name: https
          protocol: TCP
        resources:
          limits:
            cpu: 500m
            memory: 128Mi
          requests:
            cpu: 5m
            memory: 64Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - --health-probe-bind-address=:8081
        - --metrics-bind-address=127.0.0.1:8080
        - --leader-elect
        command:
        - /manager
        env:
        - name: KUBERNETES_CLUSTER_DOMAIN
          value: cluster.local
        - name: OPERATOR_SINK_WEBHOOK_TIMEOUT_SECONDS
          value: 30s
        image: ghcr.io/k8sgpt-ai/k8sgpt-operator:v0.1.4
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 15
          periodSeconds: 20
          successThreshold: 1
          timeoutSeconds: 1
        name: manager
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 500m
            memory: 128Mi
          requests:
            cpu: 10m
            memory: 64Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsNonRoot: true
      serviceAccount: release-k8sgpt-operator-controller-manager
      serviceAccountName: release-k8sgpt-operator-controller-manager
      terminationGracePeriodSeconds: 10
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2024-05-15T21:20:41Z"
    lastUpdateTime: "2024-05-15T21:20:41Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2024-05-15T21:20:30Z"
    lastUpdateTime: "2024-05-15T21:20:41Z"
    message: ReplicaSet "release-k8sgpt-operator-controller-manager-7885ffcd45" has
      successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1
JuHyung-Son commented 1 month ago

can you check operator logs? there is a backOff in backend ai. make it enabled: false and test again