k8sgpt-ai / k8sgpt-operator

Automatic SRE Superpowers within your Kubernetes cluster
https://k8sgpt.ai
Apache License 2.0
295 stars 81 forks source link

feat!: monitoring multiple clusters #266

Closed prometherion closed 7 months ago

prometherion commented 10 months ago

Closes #265

📑 Description

⚠️ This is PR that introduces breaking changes, please, read it carefully

✅ Checks

ℹ Additional Information

There's no need to bump the k8sgpt Custom Resource Definition version since there are just additions of fields, and not a reorg of them. Of course, to take full advantage of this feature, if the user has installed the Operator through Helm, they must manually update the generated Custom Resource Definition.

AlexsJones commented 9 months ago

Overall looks good A) I'd like to see an example in the readme and samples of how to use this B) I think there needs to be some thought around how a user would add a kubeconfig after having an existing deployment, or removing one - and if there would be any impact or damage.

We will need to think about how to release this

prometherion commented 9 months ago

Thanks for the review, Alex.

I just provided a brief overview of the feature and explained the use cases we're building on our infrastructure.

AlexsJones commented 8 months ago

Apologies for the delay @prometherion if you can resolve conflicts we will merge this and inform the community of the change

prometherion commented 8 months ago

Thanks, @AlexsJones, just rebased.

prometherion commented 8 months ago

@AlexsJones I'd like to do my best to get this merged, and I tested it thoroughly.

In the case of a previous k8sgt-operator installation, the previous Deployment won't be modified since the change introduced with this PR requires a label selector 1:1 named with the K8sGPT instance.

With that said, although there are some errors in the logs of the operator, we're going to preserve the previous behaviour till the cluster administrator deletes manually the k8sgpt instance Deployment to let it deploy according to the new template.

Furthermore, since we have an additional field, there's no need to bump the CRD version since it's a backward-compatible change: I'm raising the seniority flag here since it's the same review comment I received from Vince on the AWS Cluster API provider.

Implementing a new CRD version would be an overkill since it would require a conversion webhook which requires a TLS certificate, also considering the state of the current version (v1alpha1) which accepts breaking changes, although this is not the case.

arbreezy commented 7 months ago

It looks good and I agree @prometherion we don't need yet a conversion hook. Let me test it a little bit more, due to recent changes been merged and I will update the review on Wednesday. Thanks for your patience :)