kubernetes-sigs / cluster-api-addon-provider-helm

CAAPH uses Helm charts to manage the installation and lifecycle of Cluster API add-ons.
Apache License 2.0
125 stars 33 forks source link

Add `ensure` Option to `spec.uninstall` for Ensuring Proper Cleanup of HelmChartProxy Resources #319

Open kahirokunn opened 2 days ago

kahirokunn commented 2 days ago

User Story

As an operator, I would like an option to ensure the uninstallation of HelmChartProxy resources. This is to properly clean up external resources during cluster deletion and prevent unnecessary resource remnants or unexpected behavior.

Detailed Description

I propose adding a new option, ensure, to the spec.uninstall section of HelmChartProxy to control its uninstallation behavior. By enabling this option, specific Helm charts can be guaranteed to uninstall properly during cluster deletion. The default value of ensure should be false, allowing users to explicitly opt into this behavior.

An example configuration:

apiVersion: addons.cluster.x-k8s.io/v1alpha1
kind: HelmChartProxy
metadata:
  name: karpenter
spec:
  clusterSelector:
    matchLabels:
      karpenterChart: enabled
  repoURL: oci://public.ecr.aws/karpenter
  chartName: karpenter
  options:
    waitForJobs: true
    wait: true
    timeout: 5m
    install:
      createNamespace: true
    uninstall:
      ensure: true  # New option (default: false)
  valuesTemplate: |
    controller:
      replicaCount: 2

Challenges

Currently, the following issues can arise during cluster deletion, making proper cleanup difficult:

  1. Cluster Resource Deletion Before HelmChartProxy Uninstallation
    If the Cluster resource is deleted before HelmChartProxy has been fully uninstalled, external resources created by the Helm chart may remain orphaned, leading to inconsistencies and potential security risks.

  2. Concurrency Between HelmChartProxy Uninstallation and Cluster Deletion
    When the Cluster deletion process starts while the HelmChartProxy is still uninstalling, the Cluster deletion may execute too quickly, preventing the HelmChartProxy from completing its uninstallation process. This can result in incomplete cleanup of resources.

To address these challenges, I suggest leveraging the Cluster API [Runtime Hooks for Add-on Management](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220414-runtime-hooks.md). Specifically, the BeforeClusterDelete hook can be used to:

Other Information

/kind feature

Jont828 commented 17 hours ago

/triage-accepted

Will take a look on Thurs.