[Feature]: Constructing a Resource Graph in Kubernetes and Transmitting Information Including Subgraphs Near Problematic Nodes to LLM

kimchaeri commented 7 months ago

Checklist

[X] I've searched for similar issues and couldn't find anything matching
[X] I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

No

Problem Description

Currently, K8sGPT only provides the causes of errors and potential solutions for the respective components. Therefore, it sometimes provides inaccurate responses. However, if we construct the Kubernetes cluster as a graph and provide surrounding information of nodes where errors occur to LLM, it can provide more accurate answers. When conducting simple tests, I confirmed that providing context of the subgraph results in LLM providing more accurate answers.

Solution Description

Construct Kubernetes cluster as a graph to integrate with K8s analyzer
Extract subgraph near error-prone nodes to provide context to LLM
Visualize subgraph near error-prone nodes for error analysis

Benefits

Exploring various resources in Kubernetes and visualizing them as graphs enables users to understand the system's structure
By utilizing information near where errors occur, users can obtain more accurate answers.
Through visualizing the subgraph associated with errors, users can obtain detailed information related to the root cause of the problem, rather than just error messages.

Potential Drawbacks

No response

Additional Information

No response

AlexsJones commented 7 months ago

This is a really interesting concept. I think it could be a powerful feature.

qdrddr commented 7 months ago

You could build a graph based on helm chart installed or using GitOps tools such as ArgoCD or FluxCD. FYI @kimchaeri

kimchaeri commented 7 months ago

You could build a graph based on helm chart installed or using GitOps tools such as ArgoCD or FluxCD. FYI @kimchaeri

oh, thank you for sharing that with me!

vedant-8680 commented 7 months ago

https://github.com/benc-uk/kubeview Try using this tool for generating the graph. It builds something similar to what ArgoCD or FluxCD does for Kubernetes resources. @kimchaeri @qdrddr

arbreezy commented 7 months ago

Extract subgraph near error-prone nodes to provide context to LLM

I think this looks like a clever approach.

I believe we also want to distinguish ownership from selectors and labels when we build a graph or in general when we create relationships between K8s resources.

Ownership by leveraging metadata.ownerReferences is a good start but I think labels can build wider relationships in terms of workloads so we can contextualize the errors that we pass to LLMs

e.g a workload X may consist of an Ingress - Service - deployment - CronJob and K8sGPT would generate error messages specific to this workload

Construct Kubernetes cluster as a graph to integrate with K8s analyzer

I think an integration with another OSS tool that provides this capability would be a great start.

miguelvr commented 6 months ago

I was about to open a similar issue, and found this one.

It makes perfect sense to take into account resource ownership by leveraging a tool like https://github.com/ahmetb/kubectl-tree

Later on, we can include common label or annotations like the ones in argocd or helm to group applications.

k8sgpt-ai / k8sgpt