Open rmb938 opened 3 years ago
Are there any workarounds for this?
For instance, the gitlab helm chart created a stateful set. If for some reason the upgraded chart, or a value was changed etc, and the upgrade failed. I would in no way want to destroy and re-create my production deployment. I would expect to be able to roll back, and re-deploy.
We use TF for everything here, so rolling out our chart deployments with TF made sense, but if we have no way to recover a failed deployment I am really hesitant to go this route. Im much more familar with flux or helmfile to manage the deployments.
Is there any fix for this?
we are using 2.2.0 version of helm, every time if the task fail for any reason terraform tries to destroy the deployment and recreate which is not feasible in production environment. Thanks
@aareet this is critical for us please help out.
Manually untainting the helm release before trying to redeploy is the current workaround: https://www.terraform.io/cli/commands/untaint
Marking this issue as stale due to inactivity. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. This helps our maintainers find and focus on the active issues. Maintainers may also remove the stale label at their discretion. Thank you!
Still needed
any updates here ?
Running into this myself and found this issue. In my case, chart installation whose initial requirements end up needing to scale up underlying cluster resources can take variable and unpredictable time to fulfill. In the event that they take longer than the configured timeout, we end up in this position.
Been a while since I've done any terraform provider development, but if memory serves this behavior comes from terraform core. In the event there is a failure to initially create a resource, terraform will take what it perceives as the safest action and marks it tainted to attempt a clean recreate. In many cases this can make sense. Even for helm specifically since it has hooks that only run on that first install so recreating gets them to run again on what is assumed to be a "clean" starting point.
This behavior should be able to be avoided if the provider implementation does a partial save of the resource before failing out the create
function. I think this might be as simple as using d.SetId(...)
once the release is created but before it has confirmed to be successful. I think there is also a specific "partial" save concept that was meant to be used for resources that have to be implemented as combinations of multiple api interactions with the remote service.
If this is a viable approach, would probably expect to have it optional per helm_release
resource as the current behavior may be required for charts that use *-install
hooks.
Hopefully this helps move this issue along as the current behavior can definitely be counterproductive in many cases.
Terraform Version and Provider Version
Terraform v0.12.20
Provider Version
1.3.2
Affected Resource(s)
Terraform Configuration Files
Expected Behavior
The resource to not be tainted. Some helm charts contain crds and if the chart fails to deploy during an upgrade the destroy will delete the crds. This is very destructive as deleting crds will destroy all their resources.
This behavior seems to be different than the old helm 2 version of this provider. During a failure it didn't taint the resource, instead it just tried to fix it by re-applying the helm chart.
Actual Behavior
Terraform Plan shows
Steps to Reproduce
terraform apply
terraform plan
Community Note