rancher / rancher

Complete container management platform
http://rancher.com
Apache License 2.0
23.29k stars 2.95k forks source link

[BUG] Rancher deletes Cluster API Cluster resources that it did not create #39924

Closed tmmorin closed 2 months ago

tmmorin commented 1 year ago

Rancher Server Setup

Information about the Cluster

User Information

Admin User

Describe the bug

When one deteles a Cluster.provisioning.cattle.io resource, if there is a Cluster.cluster.x-k8s.io resource (that a user may have created in the context of a use of ClusterAPI) with the same name, Rancher will delete that one as well, merely based on name matching only and despite the fact that Rancher has no relationship with this resource.

To Reproduce

Result

The Cluster.cluster.x-k8s.io resource is deleted.

Expected Result

The Cluster.cluster.x-k8s.io resource should not be deleted.

tmmorin commented 1 year ago

ping @richardcase, as a follow up to our discussions

richardcase commented 1 year ago

In the remove handler for Cluster.provisioning.cattle.io if a CAPI Cluster (Cluster.cluster.x-k8s.io) exists with the same name & namespace then it is deleted without checking ownership reference, labels/annotations:

https://github.com/rancher/rancher/blob/release/v2.7/pkg/controllers/provisioningv2/cluster/remove.go#L95:L106

richardcase commented 1 year ago

@tmmorin - also to confirm you use embedded-cluster-api=false to disable the embedded CAPI controllers? (because you are using the standard CAPI controllers/providers)

tmmorin commented 1 year ago

@tmmorin - also to confirm you use embedded-cluster-api=false to disable the embedded CAPI controllers? (because you are using the standard CAPI controllers/providers)

yes, indeed

we install rancher with the official Helm chart, with this in the values:

features: embedded-cluster-api=false,provisioningv2=true
richardcase commented 1 year ago

Internal reference: SURE-5768

kkaempf commented 1 year ago

https://github.com/rancher/rancher/pull/40714

sowmyav27 commented 1 year ago

moving it to Review state since the linked PR ^ is not merged.

richardcase commented 1 year ago

@cpinjani - these are the instructions i used.

docker network ls
docker network create \
      --driver=bridge \
      --subnet=172.19.0.0/16 \
      --gateway=172.19.0.1 \
      --opt "com.docker.network.bridge.enable_ip_masquerade"="true" \
      --opt "com.docker.network.driver.mtu"="1500" \
      kind
apiVersion: k3d.io/v1alpha5
kind: Simple 
metadata:
    name: rancher-test
servers: 1
agents: 0
image: rancher/k3s:v1.25.9-k3s1
network: kind
volumes:
    - volume: /var/run/docker.sock:/var/run/docker.sock
      nodeFilters:
        - server:*
k3d cluster create --config k3d-capd.yaml
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.5.1/cert-manager.crds.yaml
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.5.1
export EXP_CLUSTER_RESOURCE_SET=true
clusterctl init -i docker
kubectl create namespace cattle-system
helm install rancher rancher-stable/rancher --namespace cattle-system --set bootstrapPassword=admin --set replicas=1  --set hostname=tunnel.fruitcase.dev --set 'extraEnv[0].name=CATTLE_FEATURES' --set 'extraEnv[0].value=embedded-cluster-api=false' --set global.cattle.psp.enabled=false --version 2.7.3
export CONTROL_PLANE_MACHINE_COUNT=1
export WORKER_MACHINE_COUNT=1
export KUBERNETES_VERSION=v1.26.4

clusterctl generate cluster test1 --target-namespace fleet-default --from https://github.com/capi-samples/opensuse-23/blob/main/templates/kubeadm-docker.yaml > cluster.yaml
kubectl apply -f cluster.yaml
kubectl get machines -A -w
kubectl get clusters.cluster.x-k8s.io -A
cpinjani commented 1 year ago

@richardcase Able to reproduce with above steps on reported version 2.6.9, will proceed with validation on fixed version.

Imported CAPI cluster into rancher:

$ kubectl get clusters.cluster.x-k8s.io -A
NAMESPACE       NAME    PHASE         AGE   VERSION
fleet-default   test1   Provisioned   14m   

$ kubectl get Cluster.provisioning.cattle.io -A
NAMESPACE       NAME    READY   KUBECONFIG
fleet-local     local   true    local-kubeconfig
fleet-default   test1   true    test1-kubeconfig

After deleting imported cluster in Rancher:

$ kubectl get Cluster.provisioning.cattle.io -A
NAMESPACE     NAME    READY   KUBECONFIG
fleet-local   local   true    local-kubeconfig

$ kubectl get clusters.cluster.x-k8s.io -A
No resources found
cpinjani commented 1 year ago

As the fix is reverted, moving back to "In-progress" cc: @richardcase

kkaempf commented 1 year ago

too late for 2.7.5, moving to Q3

richardcase commented 1 year ago

We fixed the original issue but due to a change elsewhere we were unable to verify it. This was due to another issue with the 2 resources named the same related to the kubeconfig secret.

We need to take into account both aspects into account when fixing this.

cpinjani commented 3 months ago

Validation results on v2.8-ef234b38637b8ab5c73f81e2104124e47ae71bd2-head Unable to reproduce this issue. @alexander-demicev Please review the issue for closure.

Cluster.provisioning.cattle.io resource name and display names are now different. For eg. Display name on Rancher UI: cluster1 Resource name:

$ kubectl get Cluster.provisioning.cattle.io -A
NAMESPACE       NAME      READY   KUBECONFIG
fleet-default   c-wm4qd   true    c-wm4qd-kubeconfig
fleet-local     local     true    local-kubeconfig

Steps

Results

$ kubectl get clusters.cluster.x-k8s.io NAME PHASE AGE VERSION c-67xdp Provisioned 3m46s

- Import CAPI cluster into Rancher, importing using resource name not allowed
![image](https://github.com/rancher/rancher/assets/73099870/b898d60b-de20-4cb1-bb46-db2dec67ebdb)

- Import using display name

$ kubectl get Cluster.provisioning.cattle.io -A NAMESPACE NAME READY KUBECONFIG fleet-default c-67xdp true c-67xdp-kubeconfig fleet-default cluster1 true cluster1-kubeconfig fleet-local local true local-kubeconfig

- Delete `Cluster.provisioning.cattle.io` resource
CAPI cluster is intact:

$ kubectl get clusters.cluster.x-k8s.io NAME PHASE AGE VERSION c-67xdp Provisioned 8m43s

salasberryfin commented 2 months ago

As of today, we can no longer reproduce this and consider it solved.