Closed lorelei-rupp-imprivata closed 11 months ago
Hi @lorelei-rupp-imprivata,
I don't see any disruptive changes introduced in v2.10.0
. Could you please share more information to better understand the issue? Please fill in all fields in the issue template such as the current provider version, the resource config, etc.
Thanks.
Sure! @arybolovlev
Terraform v1.2.7 Helm Provider 2.10.0 Kubernetes 1.24
When I run my argocd module with the helm provider 2.9.0 and no other changes I see this in my plan We have an perpetual /infinite diff because of the credentials/secrets set unfortunately
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# helm_release.argocd will be updated in-place
~ resource "helm_release" "argocd" {
id = "argocd"
name = "argocd"
~ values = [
- (sensitive),
+ (sensitive),
]
# (27 unchanged attributes hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
When I just switch the provider to 2.10.0, I see tons of changes on all the helm releases we have in this module, here is just a snippet as its like hundreds of lines
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
~ update in-place
-/+ destroy and then create replacement
Terraform will perform the following actions:
# helm_release.argocd will be updated in-place
~ resource "helm_release" "argocd" {
id = "argocd"
~ metadata = [
- {
- app_version = "v2.8.4"
- chart = "argo-cd"
- name = "argocd"
- namespace = "argocd"
- revision = 99
- values = jsonencode(
ACTUALLY PRINTS SECRETS OUT IN PLAINTEXT HERE
.........
)
- version = "5.46.8"
},
] -> (known after apply)
name = "argocd"
~ values = [
- (sensitive),
+ (sensitive),
]
# (26 unchanged attributes hidden)
}
..........
# helm_release.dex-pdb must be replaced
-/+ resource "helm_release" "dex-pdb" {
~ id = "dex-pdb" -> (known after apply)
+ manifest = (known after apply)
~ metadata = [
- {
- app_version = ""
- chart = "pod-disruption-budget"
- name = "dex-pdb"
- namespace = "argocd"
- revision = 1
- values = jsonencode(
{
- matchLabels = {
- "app.kubernetes.io/component" = "dex-server"
- "app.kubernetes.io/name" = "argocd-dex-server"
}
- name = "dex-pdb"
- podDisruptionBudget = {
- minAvailable = 1
}
}
)
- version = "1.0.0"
},
] -> (known after apply)
name = "dex-pdb"
~ namespace = "argocd" -> (known after apply) # forces replacement
# (25 unchanged attributes hidden)
}
Plan: 6 to add, 1 to change, 6 to destroy.
The actual module code for like the helm release dex-pdb looks like this
resource "helm_release" "dex-pdb" {
depends_on = [
helm_release.argocd
]
create_namespace = false
chart = "../common-helm-charts/pod-disruption-budget"
name = "dex-pdb"
version = local.pdb_chart_version
namespace = helm_release.argocd.metadata.0.namespace
values = [
yamlencode(local.chart_variables_dex-pdb)
]
}
It is just using a simple helm chart to install a PDB, nothing fancy, but I think it may be changing because it uses the argocd output as a dependency and that is changing so it might be forcing this to change
The argocd helm release looks like this
resource "helm_release" "argocd" {
name = "argocd"
create_namespace = true
repository = "https://argoproj.github.io/argo-helm"
chart = "argo-cd"
version = var.argocd_version
namespace = var.namespace
values = [
yamlencode(local.chart_variables)
]
}
Its using argos helm chart passing in all the values overrides we do, some have secret values in them
I also just tried this on another module we have for another helm release and it does not show anything destructive, but that one doesn't use secrets, maybe it has to do with this sensitive data stuff perhaps?
I am really scratching my head, happy to provide more things if you need it let me know
It appears like its recomputing the metadata on this, but nothing has changed. The 2.10.0 release notes are about recomputing the metadata, and that is what this is doing, but this is super destructive for us, its going to destroy argo and recreate it, which is not something we can do. I don't understand why its recomputing it though if nothing has changed on our side
actually this looks related to my issue https://github.com/hashicorp/terraform-provider-helm/issues/1221 and https://github.com/hashicorp/terraform-provider-helm/issues/1150 and I will say that the Plan shows ALL the senstive data now and it shouldn't, it used to not, when on 2.10.0, when it shows its going to be destructive and destroy/recreate argocd, it shows and prints in plaintext all the secrets out
Hi @lorelei-rupp-imprivata,
Thank you for the detailed explanation and for sharing the code snippets.
The issue you encounter is indeed related to the changes introduced in v2.10.0
. The metadata
attribute gets recomputed when some attributes are changed. In your case, it is the values
one. Since metadata
is the computed attribute Terraform marks it as unknown. You can see it from the (known after apply)
message. This is what happens with your argocd
resource.
In your dex-pdb
resource you refer to argocd
one via helm_release.argocd.metadata.0.namespace
and that builds dependency. As we figured out above the entire metadata
will be recomputed and the namespace
attribute will be marked as unknown. It will become known only when changes are applied for argocd
resource. That is why Terraform wants to replace the namespace
attribute in resource dex-pdb
even though it will be the same value. Changing the namespace
attribute will always cause resource replacement, since it has the ForceNew
attribute set to true
.
In your case, the way to address the issue would be to change the referred attribute from metadata.0.namespace
to namespace
. For example:
resource "helm_release" "dex-pdb" {
...
namespace = helm_release.argocd.namespace
...
}
In this case, you still keep dependency and these two resources will be applied sequentially.
I hope it helps.
Thanks!
@arybolovlev Thanks for the response! I will definitely update to use the .namespace. I had originally thought it might not do the ordering properly because the namespace is computed before its actually created and sometimes terraform optimizes, but that is good to know, I will switch to this.
So for the argocd helm release issue, the problem is the values is NOT changing. We have a perpetual diff for some reason because of sensitive data. SO nothing is actually really changing. So this actually makes the perpetual diff even worse if we go to 2.10.0, because its going to destroy/recreate the release. Why does helm think something is changing when it isn't? Is that an entirely separate bug then?
also with 2.10.0, it prints out in plain text all the secret /sensitive information, when it didn't prior. I think that should warrant a release note too perhaps, because if we ran this in prod, we would have to then go and rotate all our secrets likely because they would have been plain text dumped out, that seems like a bug as well?
If I understand you right, you refer to the following change:
~ values = [
- (sensitive),
+ (sensitive),
]
I need additional context here. From your code, I understand that values
get rendered from the local value chart_variables
. It looks like one of the values there marked as sensitive, that is why Terraform marks values
as sensitive. I wonder if any value in chart_variables
has a dynamic nature? For instance, it get a value from a data source.
I have tested it with the following code and don't observe perpetual diff. I only get diff when update variables:
variable "namespace" {
default = "this"
}
variable "replicas" {
default = 3
sensitive = true
}
variable "autoscaling" {
default = true
}
locals {
chart_variables = {
"server": {
"replicas": var.replicas,
"autoscaling": {
"enabled": var.autoscaling
}
}
}
}
resource "helm_release" "argocd" {
name = "argocd"
create_namespace = true
repository = "https://argoproj.github.io/argo-helm"
chart = "argo-cd"
version = "v5.51.0"
namespace = var.namespace
values = [
yamlencode(local.chart_variables)
]
}
Let me check on the local variable in use and get you some details here!
@arybolovlev Our local.chart_variables is really large, but after scrolling through it here were the points of interest, ie things that were not just strings, or variables passing through
.......
"clusterRoleRules" = {
"enabled" = "true"
"rules" = yamldecode(file("${path.module}/templates/controller-clusterrole.yaml"))
}
.......
"resource.inclusions" = file("${path.module}/templates/resource-inclusions.yaml")
"url" = "https://${var.argocd_domain}"
"dex.config" = templatefile("${path.module}/templates/dex.tmpl", { argo_domain = var.argocd_domain })
"ui.bannercontent" = var.banner_content
"ui.bannerurl" = var.banner_url
}
"rbacConfig" = {
"policy.csv" = templatefile("${path.module}/templates/rbac.tmpl", { rbac_entries = var.argocd_rbac_entries })
"policy.default" = "role:readonly"
"scopes" = "[groups]"
}
....
"service" = {
"type" = "LoadBalancer"
"annotations" = {
"external-dns.alpha.kubernetes.io/hostname" = var.argocd_domain
"service.beta.kubernetes.io/aws-load-balancer-extra-security-groups" = aws_security_group.allow_vpn.id
}
"loadBalancerSourceRanges" = [var.vpc_cidr_block]
}
I can gist the entire thing if you are interested, but there are no datas, and the only resource we use is a security group. We do a lot of templatefile/ file calls and a yamldecode on a file. It has been doing this perpetual diff for a while now, I think honestly when we made the variables sensitive perhaps
Thanks for sharing this. If there are no dynamic values, then I guess we have only one option left. There is a difference between values
in the state file and the one that is produced by yamlencode(local.chart_variables)
. Any change in template files will cause this diff. Even if you add a space symbol. Since it is marked as sensitive, there is no way to see this diff during the planning stage. I can propose possible troubleshooting steps that should help identify why this diff happens.
I understand this is a production environment, so I would advise you to make a copy of your code and play with it locally in a different folder.
You can copy only variables that you use local.chart_variables
and remove the sensitive attribute from them, then the chart_variables
itself and all template files. You don't need the helm_release
resource or any other resources. In the case of external resources, like aws_security_group.allow_vpn.id
-- you can replace it with a variable or even hard code it. The next step would be to run the Terraform console and call yamlencode()
function. You need to do this in the same folder where you keep a copy of your code. Something like this:
$ terraform console
> yamlencode(local.chart_variables)
<<EOT
"rbac": |
create: true
"server":
"autoscaling":
"enabled": true
"replicas": 3
EOT
Or, you can add an output and run terraform plan
:
output "debug" {
value = yamlencode(local.chart_variables)
}
$ terraform plan
...
Changes to Outputs:
+ debug = <<-EOT
"rbac": |
create: true
"server":
"autoscaling":
"enabled": true
"replicas": 3
EOT
The goal here is to get a YAML document that gets rendered by yamlencode()
. Then, you need to compare this output with the values
in your original code. Since it is marked as sensitive, you won't be able to get it via terraform state show helm_release.argocd
but you can get it from the state file. For example:
"values": [
"\"rbac\": |\n create: true\n\"server\":\n \"autoscaling\":\n \"enabled\": true\n \"replicas\": 3\n"
],
Then use echo
in your shell to print it out nicely:
$ echo "\"rbac\": |\n create: true\n\"server\":\n \"autoscaling\":\n \"enabled\": true\n \"replicas\": 3\n"
"rbac": |
create: true
"server":
"autoscaling":
"enabled": true
"replicas": 3
If you compare outputs, it should be a difference between the output you render from the copy of your code and the actual state.
It might be invisible or hard to recognize symbols, so use a tool to get a diff.
Maybe there is an easy way to get diff, but this is the first one that came to my mind for this case.
Thanks ill look into this to see what it says, its just so odd that it thinks there is a change when there isn't
Ok I was able to run this and compare in intelij
some things I noticed were
email:
subject: Application {{.app.metadata.name}} sync status is 'Unknown'
message: |
{{if eq .serviceType "slack"}}:exclamation:{{end}} Application {{.app.metadata.name}} sync is 'Unknown'.
Application details: {{.context.argocdUrl}}/applications/{{.app.metadata.name}}.
{{if ne .serviceType "slack"}}
{{range $c := .app.status.conditions}}
* {{$c.message}}
{{end}}
{{end}}
slack:
.....
In the state the variables like $c
is just empty like an empty space
These are loaded in here
"templates" = {
"template.app-deployed" = file("${path.module}/templates/app-deployed.tmpl")
"template.app-health-degraded" = file("${path.module}/templates/health-degraded.tmpl")
"template.app-sync-failed" = file("${path.module}/templates/sync-failed.tmpl")
"template.app-sync-running" = file("${path.module}/templates/sync-running.tmpl")
"template.app-sync-status-unknown" = file("${path.module}/templates/status-unknown.tmpl")
"template.app-sync-succeeded" = file("${path.module}/templates/sync-succeeded.tmpl")
}
"triggers" = {
"trigger.on-deployed" = file("${path.module}/templates/trigger-on-deployed.tmpl")
"trigger.on-health-degraded" = file("${path.module}/templates/trigger-on-health-degraded.tmpl")
"trigger.on-sync-failed" = file("${path.module}/templates/trigger-on-sync-failed.tmpl")
"trigger.on-sync-running" = file("${path.module}/templates/trigger-on-sync-running.tmpl")
}
I am not sold though these are the problem
I opened this ticket here https://github.com/hashicorp/terraform-provider-helm/issues/1295 for the perpetual diff issue
after solving our perpetual diff issue and moving sensitive values to the set_sensitive block we no longer see an issue here
We were looking to upgrade to the latest provider, but noticed in doing so with the 2.10.0 release and above all the plans appear to be very destructive to helm resources already in production
When updating our provider to 2.10.0, we see plans that show every helm release needs to be replaced. The Release notes https://github.com/hashicorp/terraform-provider-helm/releases/tag/v2.10.0 do not mention anything of the sort.
Is this intentional? I cannot imagine we are the only one that would be impacted here and this prevents us from ever being able to upgrade. Is there a migration path so its not destructive? Is there more details on what is going to happen perhaps?
This is what the plan now shows for example