hashicorp / terraform-provider-helm

Terraform Helm provider
https://www.terraform.io/docs/providers/helm/
Mozilla Public License 2.0
997 stars 367 forks source link

Perpetual diff on Deployment and crash on apply for OCI chart with manifest experiment enabled #1402

Open Roberdvs opened 2 months ago

Roberdvs commented 2 months ago

Terraform, Provider, Kubernetes and Helm Versions

Terraform version: 1.5.3
Provider version: >= 2.13.0
Kubernetes version: EKS 1.29

Affected Resource(s)

Terraform Configuration Files


provider "helm" {
  experiments {
    manifest = true
  }
  kubernetes {
    host                   = data.aws_eks_cluster.cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws"
      args        = ["eks", "get-token", "--cluster-name", var.cluster_name, "--region", var.aws_region, "--role-arn", var.aws_role_arn]
    }
  }
}

resource "helm_release" "release" {

  name       = "kubecost"
  chart      = "cost-analyzer"
  repository = "oci://public.ecr.aws/kubecost"
  version          = "2.3.2"
  namespace        = "kubecost"
  create_namespace = true
}

Debug Output

NOTE: In addition to Terraform debugging, please set HELM_DEBUG=1 to enable debugging info from helm.

╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for module.kubecost.helm_release.release to include
│ new values learned so far during apply, provider
│ "registry.terraform.io/hashicorp/helm" produced an invalid new value for
│ .manifest: was
...
OMITTED, too long to paste here
...
│ 
│ This is a bug in the provider, which should be reported in the provider's
│ own issue tracker.

Panic Output

Steps to Reproduce

I have seen this with Kubecost OCI chart, but might be reproducible with others

  1. Enable manifest diff experiment in the provider
  2. Run terraform apply

If the resource already exists, it shows this perpetual diff on every plan

image

and then crashes on apply with the error above

References

This got released on 2.13 and is probably related:

Community Note

linuxdaemon commented 1 month ago

EDIT: Never mind, this label is part of my specific chart's template, it is just using .Release.Revision and helm computes the templates with the revision incremented for diffs.

I can confirm this is the case on 2.14.0, terraform 1.9.2, though I'm seeing a diff on the helm-revision label. Definitely seems to be affecting OCI charts, specifically my resource definition was:

resource "helm_release" "volsync" {
  create_namespace = true
  atomic           = true
  cleanup_on_fail  = true

  chart     = "oci://tccr.io/truecharts/volsync"
  name      = "volsync"
  namespace = "volsync"

  version = "2.2.0"

  values = [
    yamlencode({
      metrics = {
        main = {
          enabled = false
        }
      }
    })
  ]

  depends_on = [helm_release.snapshot_controller]
}

Looking through the terraform state, it seems that on OCI charts the state with manifests stores the helm-revision label for each resource, but non-OCI charts seem not to store it, which I assume leads to the state desync.

Actually, I just tested this with the helm CLI and it seems that the issue stems from there, in a dry-run upgrade of the OCI chart, resources all have the helm-revision label (incremented from the live value by one), but the non-OCI chart does not show the helm-revision label at all. So this may actually be a core helm issue

wondersd commented 1 month ago

Ive noticed this too in other truechart helm charts. They all seem to use this pattern of embedding the .Release.Revision as a pod label, making none of them compatible with manifest. I've also come across this type of pattern in other charts as well, seems to be a not terribly uncommon practice or perhaps one growing in popularity.

the kubernetes_manifest provider can have similar problems when it comes to things like labels being injected onto resources post creation and has a work around with computed fields

Perhaps the same tact could be taken here where a yq style path can be used to denote fields that will always be different so they can be ignored for purposes of computing differences.

resource "helm_release" "this" {
  ...
  computed_fields = [
    "kubecost/deployment.apps/apps/v1/kubecost-cost-analyzer.spec.template.metadata.labels.helm-rollout-restarter"
  ]
}