k8sgpt-ai / k8sgpt-operator

Automatic SRE Superpowers within your Kubernetes cluster
https://k8sgpt.ai
Apache License 2.0
279 stars 76 forks source link

[Feature]: Support IPv6 k8s clusters #369

Closed mjhumkhawala-ias closed 4 months ago

mjhumkhawala-ias commented 4 months ago

Checklist

Is this feature request related to a problem?

Yes

Problem Description

If the k8sGPT operator is configured and set up in a Kubernetes cluster that uses only IPv6 (single stack), it results in an error -

Creating new client for fdw1:9we0:6098::cw6f:8080
Finished Reconciling k8sGPT with error: dial tcp: address fdw1:9we0:6098::cw6f:8080: too many colons in address
2024-03-06T15:39:24Z    ERROR   Reconciler error    {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-agent","namespace":"k8sgpt"}, "namespace": "k8sgpt", "name": "k8sgpt-agent", "reconcileID": "8a3bcc27-3fbf-4a86-8d73-f1fbbc98942c", "error": "dial tcp: address fdw1:9we0:6098::cw6f:8080: too many colons in address"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226
Creating new client for fdw1:9we0:6098::cw6f:8080
Finished Reconciling k8sGPT with error: dial tcp: address fdw1:9we0:6098::cw6f:8080: too many colons in address
2024-03-06T15:56:04Z    ERROR   Reconciler error    {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-agent","namespace":"k8sgpt"}, "namespace": "k8sgpt", "name": "k8sgpt-agent", "reconcileID": "64027712-778e-429d-ad64-28acede10079", "error": "dial tcp: address fdw1:9we0:6098::cw6f:8080: too many colons in address"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226

trying to get the results object also returns empty output -

$ kubectl get results -o json | jq .
{
  "apiVersion": "v1",
  "items": [],
  "kind": "List",
  "metadata": {
    "resourceVersion": ""
  }
}

Solution Description

Support running k8sgpt operator in a single stack IPv6 Kubernetes cluster

Benefits

As Kubernetes warms up to IPv6, it becomes advantageous for k8sgpt to also offer support for it.

Potential Drawbacks

No response

Additional Information

The matter was initially brought up in the Slack channel, following which I was recommended to create a GitHub issue.

samrocketman commented 4 months ago

For ipv6 looks like you need to detect ipv6 format and change

https://github.com/k8sgpt-ai/k8sgpt-operator/blob/95944620ffcbf570201657d400e414e2a33ff169/pkg/client/client.go#L66

Dial tcp to Dial tcp6

samrocketman commented 4 months ago

Correction, this is likely the bug https://github.com/k8sgpt-ai/k8sgpt-operator/blob/95944620ffcbf570201657d400e414e2a33ff169/pkg/client/client.go#L61

A literal IPv6 address in hostport must be enclosed in square brackets, as in "[::1]:80", "[::1%lo0]:80".

https://pkg.go.dev/net

samrocketman commented 4 months ago

Did you confirm this working in IPv6 k8s clusters?

JuHyung-Son commented 4 months ago

actually i havent. doesnt it work?

samrocketman commented 4 months ago

actually i haven't. doesn't it work?

The reporter is so they can confirm. I made the changes based on Go documentation. So if they confirm it works, then great. If not, I can bugfix

AlexsJones commented 4 months ago

Do we need to walk this back?

JuHyung-Son commented 4 months ago

i will test it on minikueb

JuHyung-Son commented 4 months ago

i tested it on kind (minikube does not support ipv6)

and it works. you can i ip below and logs from controller. it gets ipv6 pod ip.

but calling llm request failed. i guess this is because of my local cluster setting is incomplete.

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Pods(all)[11] ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ NAMESPACE↑                                        NAME                                                                           PF                           READY                                                     RESTARTS STATUS                              IP                                                NODE                                           AGE                              │
│ k8sgpt-operator-system                            eks-prod-7f586bdb68-r929q                                                      ●                            1/1                                                              0 Running                             fd00:10:244::7                                    ipv6-control-plane                             13m                              │
│ k8sgpt-operator-system                            k8sgpt-operator-controller-manager-776d7f5b7f-fg7tm                            ●                            2/2                                                              0 Running                             fd00:10:244::5                                    ipv6-control-plane                             19m                              │

CleanShot 2024-03-18 at 22 25 14@2x

llm request failed log

{"level":"info","ts":1710768169.1691122,"caller":"server/log.go:50","msg":"request failed. failed while calling AI provider localai: Post \"https://ap-northeast-2.apistage.ai/v1/solar/chat/completions\": dial tcp: lookup ap-northeast-2.apistage.ai on [fd00:10:96::a]:53: server misbehaving","duration_ms":9,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"localai\"  namespa │
│ ce:\"default\"  explain:true  language:\"ko\"  max_concurrency:10  output:\"json\"  filters:\"Platform\"  filters:\"Service\"  filters:\"CronJob\"  filters:\"Node\"  filters:\"PersistentVolumeClaim\"  filters:\"Ingress\"","remote_addr":"[fd00:10:244::5]:45202","status_code":2} 
AlexsJones commented 4 months ago

Thanks @JuHyung-Son for testing