projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.89k stars 1.31k forks source link

Helm uninstall stuck indefinitely if `tigera-operator` pre-delete job fails to schedule #9220

Open TheNilesh opened 6 days ago

TheNilesh commented 6 days ago

Expected Behavior

During Helm uninstallation of the tigera-operator chart, the uninstall job (defined as a pre-delete hook) should execute and complete or fail, allowing Helm to remove the associated resources without hanging indefinitely.

Current Behavior

Helm uninstallation becomes stuck indefinitely if the tigera-operator-uninstall job (pre-delete hook) is not scheduled or fails to execute, particularly when the cluster has no active nodes. This prevents the uninstallation process from completing, requiring manual intervention.

Possible Solution

A potential solution could involve adding configuration options to manage the behavior of the pre-delete hook. For example:

Steps to Reproduce (for bugs)

  1. Install the tigera-operator Helm chart on a Kubernetes cluster managed through a hosted control plane using Cluster API (CAPI), Kamaji, and Sveltos.
  2. Remove all nodes from the virtual cluster (leaving only the control plane).
  3. Attempt to uninstall the Helm chart using helm uninstall tigera-operator -n tigera-operator.
  4. Observe that Helm becomes stuck, waiting for the uninstall job that never runs due to the lack of nodes.

Context

This issue affects environments where the Helm chart is installed on a hosted control plane. Specifically, it impacts clusters managed by Cluster API (CAPI), Kamaji, and Sveltos. When no worker nodes are present, the uninstall job (pre-delete hook) cannot be scheduled, leaving the Helm uninstallation stuck indefinitely. This issue interferes with automated cluster management workflows, where clean resource removal is required even when no nodes are attached to the cluster. This mainly happens when managing tigera-operator installation through the Sveltos cluster profile.

Your Environment

caseydavenport commented 4 days ago

I believe helm has a --no-hooks option you can pass in order to disable the hook if there are no scheduleable nodes in the cluster. https://helm.sh/docs/helm/helm_uninstall/

  --no-hooks             prevent hooks from running during uninstallation

Would that do the trick in your case?