Closed wahabmk closed 2 weeks ago
I tried with v0.39.0 and I see that if I delete a Profile and the only matching Cluster, as soon as CAPI cluster is gone, the Profile is gone.
If CAPI cluster stays in deleted state, this is expected behaviour.
Sveltos will still try to remove resources and being the cluster not reachable fail. While this operation might appear un-necessary, triggering Sveltos is needed even when a cluster is deleted (Sveltos might have created resources in the management cluster for such a cluster and those resources need to go).
Yes, Sveltos logic could be enhanced (when a matching cluster is deleted, only remove resources in the management cluster and ignore what was deployed on the managed cluster). But that will complicate Sveltos code, so I would like to avoid it.
@gianlucam76 You are correct. I can still see that the CAPI cluster exists but in deleting state:
➜ ~ kubectl -n hmc-system get cluster
NAME CLUSTERCLASS PHASE AGE VERSION
wali-aws-dev Deleting 20h
Please feel free to close this if this is expected behaviour. Thanls!
@gianlucam76 I think I might have encountered a deadlock.
➜ ~ kubectl -n hmc-system get awscluster
No resources found in hmc-system namespace.
I1017 20:07:48.714528 1 machine_controller.go:357] "Skipping deletion of Kubernetes Node associated with Machine as it is not allowed" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="hmc-system/wali-aws-dev-md-2rwhj-glgvc" namespace="hmc-system" name="wali-aws-dev-md-2rwhj-glgvc" reconcileID="96de03a1-490a-4366-9cc2-7c8058b1c955" MachineSet="hmc-system/wali-aws-dev-md-2rwhj" Cluster="hmc-system/wali-aws-dev" Node="wali-aws-dev-md-2rwhj-glgvc" cause="cluster is being deleted"
I1017 20:07:48.719835 1 machine_controller.go:452] "Waiting for infrastructure to be deleted" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="hmc-system/wali-aws-dev-md-2rwhj-glgvc" namespace="hmc-system" name="wali-aws-dev-md-2rwhj-glgvc" reconcileID="96de03a1-490a-4366-9cc2-7c8058b1c955" MachineSet="hmc-system/wali-aws-dev-md-2rwhj" Cluster="hmc-system/wali-aws-dev" AWSMachine="hmc-system/wali-aws-dev-md-2rwhj-glgvc"
I1017 20:07:51.390829 1 cluster_controller.go:269] "Cluster still has descendants - need to requeue" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="hmc-system/wali-aws-dev" namespace="hmc-system" name="wali-aws-dev" reconcileID="05443716-a94e-4bcc-8be6-530d4114b9dc" descendants="Worker machines: wali-aws-dev-md-2rwhj-glgvc" indirect descendants count=1
I1017 20:07:56.392011 1 cluster_controller.go:269] "Cluster still has descendants - need to requeue" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="hmc-system/wali-aws-dev" namespace="hmc-system" name="wali-aws-dev" reconcileID="35941c59-27c7-46fc-9a84-a21597ccf222" descendants="Worker machines: wali-aws-dev-md-2rwhj-glgvc" indirect descendants count=1
I1016 23:21:22.030322 1 controller.go:302] "Reconciling" controller="clustersummary" controllerGroup="config.projectsveltos.io" controllerKind="ClusterSummary" ClusterSummary="hmc-system/p--wali-aws-dev-capi-wali-aws-dev" namespace="hmc-system" name="p--wali-aws-dev-capi-wali-aws-dev" reconcileID="3d98ee4c-2713-45bf-ae9b-fa7f7000fd77"
I1016 23:21:22.030397 1 clustersummary_controller.go:122] "Reconciling" controller="clustersummary" controllerGroup="config.projectsveltos.io" controllerKind="ClusterSummary" ClusterSummary="hmc-system/p--wali-aws-dev-capi-wali-aws-dev" namespace="hmc-system" name="p--wali-aws-dev-capi-wali-aws-dev" reconcileID="3d98ee4c-2713-45bf-ae9b-fa7f7000fd77"
I1016 23:21:22.030893 1 clustersummary_controller.go:225] "Reconciling ClusterSummary delete" controller="clustersummary" controllerGroup="config.projectsveltos.io" controllerKind="ClusterSummary" ClusterSummary="hmc-system/p--wali-aws-dev-capi-wali-aws-dev" namespace="hmc-system" name="p--wali-aws-dev-capi-wali-aws-dev" reconcileID="3d98ee4c-2713-45bf-ae9b-fa7f7000fd77"
E1016 23:21:22.045361 1 clustersummary_controller.go:250] "failed to remove ResourceSummary." err="failed to get API group resources: unable to retrieve the complete list of server APIs: apiextensions.k8s.io/v1: Get \"https://wali-aws-dev-apiserver-1185732421.ca-central-1.elb.amazonaws.com:6443/apis/apiextensions.k8s.io/v1\": dial tcp: lookup wali-aws-dev-apiserver-1185732421.ca-central-1.elb.amazonaws.com on 10.96.0.10:53: no such host" controller="clustersummary" controllerGroup="config.projectsveltos.io" controllerKind="ClusterSummary" ClusterSummary="hmc-system/p--wali-aws-dev-capi-wali-aws-dev" namespace="hmc-system" name="p--wali-aws-dev-capi-wali-aws-dev" reconcileID="3d98ee4c-2713-45bf-ae9b-fa7f7000fd77"
I1016 23:21:22.045893 1 controller.go:318] "Reconcile done, requeueing after 10s" controller="clustersummary" controllerGroup="config.projectsveltos.io" controllerKind="ClusterSummary" ClusterSummary="hmc-system/p--wali-aws-dev-capi-wali-aws-dev" namespace="hmc-system" name="p--wali-aws-dev-capi-wali-aws-dev" reconcileID="3d98ee4c-2713-45bf-ae9b-fa7f7000fd77"
Profile
object is an indirect descendant of it.This seems to be a race condition because I didn't encounter it again.
Thanks @wahabmk.
Sveltos Profiles are not owned by CAPI (or related). And viceversa. With that said, I will enhance Sveltos to not wait for cluster to go away. PR
From the logs you pasted it seems the AWS infrastructure provider is preventing the cluster to go away. So I feel in your case though Profile will go away, the Cluster still will remain.
I1017 20:07:48.714528 1 machine_controller.go:357] "Skipping deletion of Kubernetes Node associated with Machine as it is not allowed" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="hmc-system/wali-aws-dev-md-2rwhj-glgvc" namespace="hmc-system" name="wali-aws-dev-md-2rwhj-glgvc" reconcileID="96de03a1-490a-4366-9cc2-7c8058b1c955" MachineSet="hmc-system/wali-aws-dev-md-2rwhj" Cluster="hmc-system/wali-aws-dev" Node="wali-aws-dev-md-2rwhj-glgvc" cause="cluster is being deleted"
I1017 20:07:48.719835 1 machine_controller.go:452] "Waiting for infrastructure to be deleted" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="hmc-system/wali-aws-dev-md-2rwhj-glgvc" namespace="hmc-system" name="wali-aws-dev-md-2rwhj-glgvc" reconcileID="96de03a1-490a-4366-9cc2-7c8058b1c955" MachineSet="hmc-system/wali-aws-dev-md-2rwhj" Cluster="hmc-system/wali-aws-dev" AWSMachine="hmc-system/wali-aws-dev-md-2rwhj-glgvc"
Problem Description
Profile
object with itsownerReferences
set to another object as can be seen below:ManagedCluster
object actually spins up a CAPI cluster. So I was using theProfile
object to deploy ingress-nginx and kyverno on this CAPI cluster.ManagedCluster
object, thedeletionTimestamp
field was also set on its dependentProfile
object as can be seen above.ManagedCluster
and its associated CAPI cluster, I can still see Sveltos objects present in the management cluster withdeletionTimestamp
set on them:System Information
CLUSTER API OPERATOR: v0.12.0 KUBERNETES VERSION:
SVELTOS VERSION:
Logs
The addon-controller keeps on looping with the following error: