ClusterClass continuously reconciling resource due to incorrect `apiVersion`

mnaser commented 1 year ago

What steps did you take and what happened?

When using CAPO, I've noticed that I had a cluster that was reconciling non-stop and eating up a ton of CPU, upon further troubleshooting, I noticed that the reconciler seems to not actually grab the latest version of the CRD when making the request (my guess is that in the db, it's still using v1alpha6 but presenting v1alpha7 for user).

You can see that v1alpha7 is the newest version:

❯ kubectl -n magnum-system get crd/openstackclusters.infrastructure.cluster.x-k8s.io -oyaml | grep 'cluster.x-k8s.io/v1beta1'
    cluster.x-k8s.io/v1beta1: v1alpha5_v1alpha6_v1alpha7

The Cluster resource agrees with this too:

❯ kubectl -n magnum-system get cluster/kube-cmd33 -oyaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
...
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
    kind: OpenStackCluster
    name: kube-cmd33-4k46x
    namespace: magnum-system
...

However, you can see that when it tries to make a request to update when bringing the verbosity all the way up, snipped this from logs:

I1014 02:10:26.010999       1 round_trippers.go:466] curl -v -XPATCH  -H "User-Agent: manager/v1.5.1 cluster-api-controller-manager (linux/amd64) cluster.x-k8s.io/db17cb2" -H "Authorization: Bearer <masked>" -H "Content-Type: application/apply-patch+yaml" -H "Accept: application/json" 'https://10.96.0.1:443/apis/infrastructure.cluster.x-k8s.io/v1alpha6/namespaces/magnum-system/openstackclusters/kube-cmd33-4k46x?fieldManager=capi-topology&force=true'

And because of that, it almost always 'notices' a change, and loops endlessly, I tried to make a diff with the info that it is sending...

❯ diff -uNr /tmp/currobj.yml /tmp/dryrunapplied.yml
--- /tmp/currobj.yml    2023-10-13 21:45:47
+++ /tmp/dryrunapplied.yml  2023-10-13 21:46:00
@@ -1,13 +1,14 @@
-apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
+apiVersion: infrastructure.cluster.x-k8s.io/v1alpha6
 kind: OpenStackCluster
 metadata:
   annotations:
     cluster.x-k8s.io/cloned-from-groupkind: OpenStackClusterTemplate.infrastructure.cluster.x-k8s.io
     cluster.x-k8s.io/cloned-from-name: magnum-v0.9.1
+    topology.cluster.x-k8s.io/dry-run: ""
   creationTimestamp: "2023-07-23T09:03:45Z"
   finalizers:
   - openstackcluster.infrastructure.cluster.x-k8s.io
-  generation: 3429950
+  generation: 3430011
   labels:
     cluster.x-k8s.io/cluster-name: kube-cmd33
     topology.cluster.x-k8s.io/owned: ""
@@ -20,7 +21,7 @@
     kind: Cluster
     name: kube-cmd33
     uid: 0108ecb7-9e6e-4045-a5f0-811a8aade488
-  resourceVersion: "89143700"
+  resourceVersion: "89143889"
   uid: 0abd98ab-6010-43be-b028-44a0df84e597
 spec:
   allowAllInClusterTraffic: false
@@ -154,16 +155,16 @@
   network:
     id: a91dc22f-86fc-4677-938b-f15da173178e
     name: k8s-clusterapi-cluster-magnum-system-kube-cmd33
-    subnets:
-    - cidr: 10.0.0.0/24
+    router:
+      id: dcf60b96-6ceb-42fe-8d17-7f2c1b8b99a8
+      ips:
+      - 46.246.75.135
+      name: k8s-clusterapi-cluster-magnum-system-kube-cmd33
+    subnet:
+      cidr: 10.0.0.0/24
       id: 0ddb7a30-1bcb-4940-83b6-bf91ddadec8b
       name: k8s-clusterapi-cluster-magnum-system-kube-cmd33
   ready: true
-  router:
-    id: dcf60b96-6ceb-42fe-8d17-7f2c1b8b99a8
-    ips:
-    - A.B.C.D
-    name: k8s-clusterapi-cluster-magnum-system-kube-cmd33
   workerSecurityGroup:
     id: c1237980-280d-44de-9ff2-4fe5a4e20d9a
     name: k8s-cluster-magnum-system-kube-cmd33-secgroup-worker

So because there was a change in the OpenStackCluster, and its' pulling v1alpha6 (somehow) and v1alpha7 is the real expected version, it's just looping.. I feel like there's a spot here where it was missed to pull the up to date version of the infrastructureRef..

What did you expect to happen?

No loops and none of this to happen:

I1014 02:13:28.108463       1 reconcile_state.go:284] "Patching OpenStackCluster/kube-cmd33-4k46x" controller="topology/cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="magnum-system/kube-cmd33" namespace="magnum-system" name="kube-cmd33" reconcileID=2e8e2bc8-cc63-4b35-8c25-b83181bbd883 resource={Group:infrastructure.cluster.x-k8s.io Version:v1alpha6 Resource:OpenStackCluster} OpenStackCluster="magnum-system/kube-cmd33-4k46x"

looping.. non stop...

Cluster API version

Cluster API 1.5.1 + CAPO 0.8.0

Kubernetes version

No response

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

mnaser commented 1 year ago

I just noticed over here you can also notice that it's actually trying to pull v1alpha6:

I1014 02:13:28.108463       1 reconcile_state.go:284] "Patching OpenStackCluster/kube-cmd33-4k46x" controller="topology/cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="magnum-system/kube-cmd33" namespace="magnum-system" name="kube-cmd33" reconcileID=2e8e2bc8-cc63-4b35-8c25-b83181bbd883 resource={Group:infrastructure.cluster.x-k8s.io Version:v1alpha6 Resource:OpenStackCluster} OpenStackCluster="magnum-system/kube-cmd33-4k46x"

mnaser commented 1 year ago

after hours of troubleshooting, i put up an issue and i figure it out. it was the version from the clusterclass. it seems the mismatch is happening there if there's an old version from the clusterclass and a new version installed on the cluster that it constantly reconciles.

kinda feels related to #9384 which i ran into when trying to troubleshoot.

killianmuldoon commented 1 year ago

Thanks for reporting this and following up with your troubleshooting.

Just so I'm clear - you have a v1alpha6 Infrastructure cluster referenced under ClusterClass .spec.infrastructure.ref? The result of this is continuous reconciliation of the Cluster.

Is there an error being returned or is it just the Patching... line that gets repeated? Generally we wouldn't expect continous reconciliation here so there might be a bug there.

We could also possibly mitigate the core issue either by automatically updating the references in the ClusterClass to the latest storage version, which is what we do for Clusters and is the subject of #9384, or we could at least return a webhook warning when the version referenced is not the latest storage version.

As in #9384 I think the real solution to this issue will be to remove the apiVersion from these object references in a future version of CAPI, but that requires and API revision.

killianmuldoon commented 1 year ago

To follow up - I wasn't able to reproduce this simply in CAPD. I set the apiVersion to v1alpha4 and the Cluster came up without issue and there doesn't seem to be any runaway reconcile.

It would be really useful to get the actual logs from this instance to get a better understanding of what might be going wrong and how to reproduce it in a test case.

killianmuldoon commented 1 year ago

/triage accepted

killianmuldoon commented 1 year ago

It looks like the APIServer does produce warnings, at least when the apiVersion is marked as deprecated so at least we don't have introduce that on the CAPI side.

fabriziopandini commented 1 year ago

cc @sbueringer

mnaser commented 1 year ago

Thanks for reporting this and following up with your troubleshooting.

Just so I'm clear - you have a v1alpha6 Infrastructure cluster referenced under ClusterClass .spec.infrastructure.ref? The result of this is continuous reconciliation of the Cluster.

Sorry if my report was a bit all over the place, let me try and tl;dr it:

ClusterClass was referencing OpenStackClusterTemplate with apiVersion: infrastructure.cluster.x-k8s.io/v1alpha6
CAPI was computing a OpenStackCluster with an older apiVersion
CAPI was getting "current" OpenStackCluster with v1alpha7 (because the Cluster resource had infrastructureRef.apiVersion set to v1alpha7).
CAPI always detected a change, because it was comparing v1alpha6 (generated) vs v1alpha7 (retrieved)
CAPI would do an SSA against v1alpha6, it would get a v1alpha7 in response

and this kept looping and looping non-stop. The key thing here would be that the referenced template would have an older version than the one in the Cluster (which is automatically 'bumped' by CAPI when it sees a new version available).

sbueringer commented 1 year ago

Hm. I'm wondering if there is something wrong in the conversion webhook of the OpenStack cluster.

The controller is not simply comparing v1alpha6 (generated) vs v1alpha7 (retrieved). It is running through a few SSA dry-runs and then it compares if re-applying generated would lead to a diff. In general I would have expected that this leads to no diffs as the v1alpha6 generated object should go through conversion, defaulting, ... . Obviously looks like it's not enough.

@mnaser Do you think there is any way that I can reproduce this locally? I'm thinking about deploying core CAPI + CAPO via our tilt env and then deploying a bunch of YAMLs for the OpenStack cluster. I think to reproduce this effect it's not relevant if the cluster actually comes up. WDYT, would it be possible to provide the YAMLs for the OpenStackCluster?

Q: Do you know if the OpenStack cluster controller is writing the OpenStack cluster object?

I'm not sure if I'll be able to debug this otherwise, this is one of the most complicated areas of Cluster API.

mnaser commented 1 year ago

@sbueringer i think that @mdbooth might have better insights about this.

Otherwise if you'd like I can spin up an environment against our public cloud to help save your time if you don't easily have access to an OpenStack cloud (or can provide credentials)

sbueringer commented 1 year ago

Something like this would be helpful. If possible an env + the YAML's you're using to hit this issue would be great. I probably won't get to it before KubeCon though.

mdbooth commented 1 year ago

I've been thinking about this since I first raised it a few months ago. As implemented, the apiVersion field is a status field, not a spec field, because it can't be specified. I think the simplest solution until we can fix it in an API bump would be to allow it be be unset. The controller will populate it anyway so it's still there for any consumers of it, but it doesn't break SSA for the client because there's no owner fight.

fabriziopandini commented 7 months ago

/priority important-soon

k8s-triage-robot commented 3 months ago

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged. Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Deprioritize it with /priority important-longterm or /priority backlog
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

sbueringer commented 2 months ago

/triage accepted

kubernetes-sigs / cluster-api