kubernetes-sigs / cluster-api-provider-azure

Cluster API implementation for Microsoft Azure
https://capz.sigs.k8s.io/
Apache License 2.0
290 stars 417 forks source link

Self-Hosted cluster deletion failing in scenarios when manual deletion is attempted first. #4609

Open ardixit-msft-la opened 5 months ago

ardixit-msft-la commented 5 months ago

/kind bug

[Before submitting an issue, have you checked the Troubleshooting Guide?]

What steps did you take and what happened: [A clear and concise description of what the bug is.] In cases where manual deletion of self-hosted cluster is attempted before deleting through kubectl command, the cluster never gets deleted. On reattempting manual deletions, the resources are recreated while the provisioning state of the cluster is shown as deleting. kubectl --kubeconfig C:\Users\ardixit.kube\management get clusters image

What did you expect to happen: After the deletion is attempted through the following command

kubectl --kubeconfig C:\Users\ardixit.kube\management delete cluster flexiblec220849

The cluster should be deleted instead of hanging the deleting state. The cluster should not be recreated.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] This issue was reproduced with an AKS setup with 4 cluster, each having 5 user nodepools. The issue is intermittent and happens twice in approx. 10 runs.

Environment:

jackfrancis commented 5 months ago

/assign @nawazkh

@ardixit-msft-la could you be more specific what "manual deletion of self-hosted cluster is attempted before deleting through kubectl command" means? This way we can repro.

  1. Create an AKS cluster managed by CAPZ
  2. Manually delete <something???>
  3. Attempt to delete cluster object from CAPI mgmt cluster
  4. Observe that cluster gets stuck in Deleting state

Specifically, we need more info on repro step 2 above.

Thanks!

cc @nojnhuh

ardixit-msft-la commented 5 months ago

/assign @nawazkh

@ardixit-msft-la could you be more specific what "manual deletion of self-hosted cluster is attempted before deleting through kubectl command" means? This way we can repro.

  1. Create an AKS cluster managed by CAPZ
  2. Manually delete <something???>
  3. Attempt to delete cluster object from CAPI mgmt cluster
  4. Observe that cluster gets stuck in Deleting state

Specifically, we need more info on repro step 2 above.

Thanks!

cc @nojnhuh

Correct.

Here are the steps.

  1. Create a Self-Hosted cluster managed by CAPZ
  2. Manually delete the resource group hosting the self-hosted cluster from Azure portal
  3. Attempt to delete cluster object from CAPI mgmt cluster
  4. Observe that cluster gets stuck in Deleting state
nawazkh commented 5 months ago

@ardixit-msft-la Sorry for the delay.

  1. Create a Self-Hosted cluster managed by CAPZ
  2. Manually delete the resource group hosting the self-hosted cluster from Azure portal
  3. Attempt to delete cluster object from CAPI mgmt cluster
  4. Observe that cluster gets stuck in Deleting state

I dont see the erronous behavior by following the steps shared above. I am using CAPZ main for the repro and K8s: v1.28.5. Can you share the version of CAPZ and Kubernetes where you see this error ?

Also, can you delineate the steps even more? Are the steps 2 and 3 performed immediately one after the other? Or are you waiting for step 2 to finish before executing step 3 ?

nawazkh commented 5 months ago

Retried with CAPZ v1.13.2 and K8s v1.28.5 but could not repro this issue.

ardixit-msft-la commented 5 months ago

I am using k8s version 1.26.0 and I have the live environment for the same. Please let me know, I can help you with that.

nawazkh commented 5 months ago

Can you please share the CAPZ version as well?

ardixit-msft-la commented 5 months ago

I am not sure where/how can I find CAPZ version. Can you please help?

nawazkh commented 5 months ago

I am not sure where/how can I find CAPZ version. Can you please help?

One of the ways is to get the version suffixed to capz-controller-manager-xyz pod from the management cluster.

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten