solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.08k stars 437 forks source link

Orphaned ClusterRoleBinding not cleaned up on uninstall #5869

Open jameshbarton opened 2 years ago

jameshbarton commented 2 years ago

Gloo Edge Version

1.10.x (latest stable)

Kubernetes Version

No response

Describe the bug

An enterprise user tried to uninstall a recent 1.10 release with glooctl uninstall --all. The uninstall failed to delete the ClusterRoleBinding glooe-prometheus-kube-state-metrics. That caused a subsequent attempt at reinstallation to fail.

$ glooctl uninstall --all
Uninstalling Gloo Edge...
Removing Gloo system components from namespace gloo-system...
Removing Gloo CRDs...
Removing namespace gloo-system... Done.

Gloo was successfully uninstalled.

$ glooctl install gateway enterprise  --license-key "a-valid-license-key"
Creating namespace gloo-system... Done.
Starting Gloo Edge Enterprise installation...

Gloo failed to install! Detailed logs available at /Users/xxx/.gloo/debug.log.
Error: installing Gloo Edge Enterprise in gateway mode: rendered manifests contain a resource that already exists. Unable to continue with install: ClusterRoleBinding "glooe-prometheus-kube-state-metrics" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "gloo"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "gloo-system"

$ cat  /Users/xxx/.gloo/debug.log
No resources found
No resources found
No resources found
No resources found
No resources found
No resources found
No resources found
No resources found
No resources found
No resources found
No resources found
customresourcedefinition.apiextensions.k8s.io "gateways.gateway.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "proxies.gloo.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "settings.gloo.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "upstreams.gloo.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "upstreamgroups.gloo.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "virtualservices.gateway.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "routetables.gateway.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "authconfigs.enterprise.gloo.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "ratelimitconfigs.ratelimit.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "virtualhostoptions.gateway.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "routeoptions.gateway.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "glooinstances.fed.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "failoverschemes.fed.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "federatedauthconfigs.fed.enterprise.gloo.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "federatedgateways.fed.gateway.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "federatedroutetables.fed.gateway.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "federatedsettings.fed.gloo.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "federatedupstreamgroups.fed.gloo.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "federatedupstreams.fed.gloo.solo.io" deleted
customresourcedefinition.apiextensions.k8s.io "federatedvirtualservices.fed.gateway.solo.io" deleted
namespace "gloo-system" deleted

Steps to reproduce the bug

  1. Install Edge Enterprise on EKS.
  2. glooctl uninstall --all
  3. Observe the orphaned ClusterRoleBinding. kubectl get clusterrolebinding glooe-prometheus-kube-state-metrics

Expected Behavior

All ClusterRoleBindings should be removed on glooctl uninstall --all.

Additional Context

This happened on EKS for customer.

jameshbarton commented 2 years ago

Workaround is to manually delete the CRB by hand before reinstalling: kubectl delete clusterrolebinding glooe-prometheus-kube-state-metrics

nfuden commented 2 years ago

I think the real issue here was how we ended up with the orphaned cluster role binding losing its ownership annotation: meta.helm.sh/release-name: gloo

It is expected that glooctl uninstall would not remove any resources not owned by gloo but the fact that we got to this point is troubling and definitely deserves the issue in and of itself.

github-actions[bot] commented 4 months ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.