rancher / turtles

Rancher CAPI extension
https://turtles.docs.rancher.com
Apache License 2.0
51 stars 16 forks source link

Cluster getting re-imported after removal from Rancher #589

Closed cpinjani closed 2 months ago

cpinjani commented 3 months ago

What steps did you take and what happened?

https://github.com/rancher/turtles/assets/73099870/e22a5858-daf8-4adf-8fbf-7939447b7c6c

Is there a race condition to annotate cluster as imported and its removal. Cluster description after getting re-imported:

Name:         cluster1
Namespace:    default
Labels:       app.kubernetes.io/managed-by=Helm
              objectset.rio.cattle.io/hash=d8a48a02f9251fc48f93df5bf362b6004500d2a9
Annotations:  imported: true
              meta.helm.sh/release-name: clusters-clusters
              meta.helm.sh/release-namespace: default
              objectset.rio.cattle.io/id: default-clusters-clusters-cattle-fleet-local-system
API Version:  cluster.x-k8s.io/v1beta1
Kind:         Cluster
Metadata:
  Creation Timestamp:  2024-07-03T12:27:13Z
  Finalizers:
    cluster.cluster.x-k8s.io
  Generation:        2
  Resource Version:  156174
  UID:               6ed57ce2-c6dc-4933-b243-5fb30d57f4d7
Spec:
  Cluster Network:
    Pods:
      Cidr Blocks:
        10.1.0.0/16
    Service Domain:  cluster.local
    Services:
      Cidr Blocks:
        10.10.0.0/16
  Control Plane Endpoint:
    Host:  172.19.0.2
    Port:  6443
  Control Plane Ref:
    API Version:  controlplane.cluster.x-k8s.io/v1beta1
    Kind:         RKE2ControlPlane
    Name:         cluster1-control-plane
    Namespace:    default
  Infrastructure Ref:
    API Version:  infrastructure.cluster.x-k8s.io/v1beta1
    Kind:         DockerCluster
    Name:         cluster1
    Namespace:    default
Status:
  Conditions:
    Last Transition Time:  2024-07-03T12:30:51Z
    Status:                True
    Type:                  Ready
    Last Transition Time:  2024-07-03T12:29:58Z
    Status:                True
    Type:                  ControlPlaneInitialized
    Last Transition Time:  2024-07-03T12:30:51Z
    Status:                True
    Type:                  ControlPlaneReady
    Last Transition Time:  2024-07-03T12:27:59Z
    Status:                True
    Type:                  InfrastructureReady
  Control Plane Ready:     true
  Infrastructure Ready:    true
  Observed Generation:     2
  Phase:                   Provisioned
Events:
  Type    Reason               Age                 From                Message
  ----    ------               ----                ----                -------
  Normal  Provisioning         31m (x2 over 31m)   cluster-controller  Cluster cluster1 is Provisioning
  Normal  InfrastructureReady  30m (x12 over 30m)  cluster-controller  Cluster cluster1 InfrastructureReady is now true
  Normal  Provisioned          30m (x11 over 30m)  cluster-controller  Cluster cluster1 is Provisioned

What did you expect to happen?

Cluster should not be re-imported after removal.

How to reproduce it?

Steps mention above

Rancher Turtles version

v0.9.0

Anything else you would like to add?

Behavior is mostly seen with v3 cluster, not v1.

Label(s) to be applied

/kind bug

furkatgofurov7 commented 2 months ago

I have spent time in reproducing this but with no luck. After removing the imported cluster, it was always gone and was never re-imported (as it is supposed to be doing), although the issue mentions it is not always reproducible, so I maybe had a luck in each try 🙂

cpinjani commented 2 months ago

On v0.9.1, issue is still occurring sporadically and after v3 cluster re-importation, user to unable to remove it from Rancher. @Danil-Grigorev PTAL.

Cluster re-imported: image

Attempt for removal after re-import: image

Cluster details Rancher:

apiVersion: management.cattle.io/v3
kind: Cluster
metadata:
  annotations:
    authz.management.cattle.io/creator-role-bindings: '{"created":[],"required":["cluster-owner"]}'
    authz.management.cattle.io/initial-sync: 'true'
    lifecycle.cattle.io/create.cluster-agent-controller-cleanup: 'true'
    lifecycle.cattle.io/create.cluster-provisioner-controller: 'true'
    lifecycle.cattle.io/create.cluster-scoped-gc: 'true'
    lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: 'true'
    management.cattle.io/current-cluster-controllers-version: v1.26.4+rke2r1
    provisioner.cattle.io/ke-driver-update: updated
  creationTimestamp: '2024-07-24T05:21:12Z'
  deletionGracePeriodSeconds: 0
  deletionTimestamp: '2024-07-24T05:53:40Z'
  finalizers:
    - capicluster.turtles.cattle.io
    - controller.cattle.io/mgmt-cluster-rbac-remove
  generateName: c-
  generation: 30
  labels:
    cluster-api.cattle.io/capi-cluster-owner: cluster1
    cluster-api.cattle.io/capi-cluster-owner-ns: default
    cluster-api.cattle.io/owned: ''
    provider.cattle.io: rke2

CAPI:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  annotations:
    imported: 'true'
    meta.helm.sh/release-name: clusters-clusters
    meta.helm.sh/release-namespace: default
    objectset.rio.cattle.io/id: default-clusters-clusters-cattle-fleet-local-system
  creationTimestamp: '2024-07-24T05:12:39Z'
  finalizers:
    - cluster.cluster.x-k8s.io
  generation: 2
  labels:
    app.kubernetes.io/managed-by: Helm
    objectset.rio.cattle.io/hash: d8a48a02f9251fc48f93df5bf362b6004500d2a9
cpinjani commented 2 months ago

Test results on latest dev build - f4a20bb (logs.tar.gz) v3 cluster is getting re-imported, however user is able to remove from Rancher after re-import. Its not a blocker, keeping issue open for further investigation for future release.

Details: Rancher:

apiVersion: management.cattle.io/v3
kind: Cluster
metadata:
  annotations:
    authz.management.cattle.io/creator-role-bindings: '{"created":[],"required":["cluster-owner"]}'
    authz.management.cattle.io/initial-sync: 'true'
    lifecycle.cattle.io/create.cluster-agent-controller-cleanup: 'true'
    lifecycle.cattle.io/create.cluster-scoped-gc: 'true'
    lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: 'true'
  creationTimestamp: '2024-07-26T06:47:17Z'
  finalizers:
    - capicluster.turtles.cattle.io
    - controller.cattle.io/cluster-agent-controller-cleanup
    - controller.cattle.io/cluster-scoped-gc
    - controller.cattle.io/cluster-provisioner-controller
    - controller.cattle.io/mgmt-cluster-rbac-remove
    - wrangler.cattle.io/mgmt-cluster-remove

CAPI:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  annotations:
    imported: 'true'
    meta.helm.sh/release-name: clusters-tests-assets-rancher-turtles-fleet-exa-9a024
    meta.helm.sh/release-namespace: default
    objectset.rio.cattle.io/id: default-clusters-tests-assets-rancher-turtles-fleet-exa-9-f0c99
  creationTimestamp: '2024-07-26T06:42:51Z'
  finalizers:
    - cluster.cluster.x-k8s.io
  generation: 2
  labels:
    app.kubernetes.io/managed-by: Helm
    cni: cluster1-crs-0
    objectset.rio.cattle.io/hash: daa87f7f877debd2e213157ee8060e79a1590d93

https://github.com/user-attachments/assets/284dcac9-f311-457f-8905-471b26279cde

Danil-Grigorev commented 2 months ago

This now looks like a cashing issue. We set imported annotation on deletion, and any event which comes close to this edge has a chance to get a “previous” CAPI Cluster object with no imported annotation. We need to start using uncached client for these scenarios.