Closed jcarlson closed 4 years ago
First of all, I am extremely impressed with what you were able to accomplish. It is good to see that people are actually using this provider to solve real business needs.
Update() has been something I have been neglecting recently. I know that there was some weird stuff about it. As far as the issue you are dealing with, that is because although the script failed to run, the environment block was still set via partial state. There is a way to override partial state, but i will need to look in the best way to do this.
In the meantime, you could choose to use a template file for your create/read/update/delete scripts and not use the environment block at all. Changing environment variables in the template file will thus change the command, and trigger a force new (taint) on the resource .
Another problem you may not have considered is that if there is an error in the script, it could prevent the resource from being deleted altogether. Consider the trivial case where i have a resource that does nothing but return an exit condition of 0 for create and 1 (error) for delete. I update delete to return 0, "fixing" the problem in the script, and also tainting the resource and causing it to be destroyed and recreated - except the delete always fails because it keeps returning 1 and so the update never succeeds. What is unique about this provider compared to all other providers is that is allows hooks into the Terraform runtime, and if there is an error in there then you don't have a way of proceeding without manually removing the resource from state file.
To solve the above problem I am thinking to have a boolean "force_destroy". That will force the destruction of a resource and ignore any errors in the script. Usually errors are common during development, but not very common after you have been running the code in production for a while. So you could set force_destroy to true when developing for faster iteration, and force_destroy to false when running in production to capture the error
I noticed that I had, on accident, referred to my lifecycle commands by reference, rather than inline, as in create = "${path.module}/create.sh"
vs create = file("${path.module}/create.sh")
.
The nice thing about "by reference" is that it allows me to tune and refine my script without causing the resource to be needlessly torn down and recreated. I'm not sure I agree that tinkering with the delete
command should cause an otherwise unchanged resource to be destroyed and recreated.
A side effect of this that I noticed, though, is that when a resource does need to be destroyed and recreated, the environment variables passed to the script are that of their current value.
Consider the following example:
resource "shell_script" "immutable" {
environment = {
IMMUTABLE_ATTR = "abc"
}
lifecycle_commands {
create = <<-CMD
echo "PUT https://api.example.com/foos/$IMMUTABLE_ATTR"
jq -n --arg immutable_attr $IMMUTABLE_ATTR '{ immutable_attr: $immutable_attr }'
CMD
delete = <<-CMD
echo "DELETE https://api.example.com/foos/$IMMUTABLE_ATTR"
CMD
}
}
Since I have defined only a create
and delete
command, changing the [immutable] environment variable should necessitate a destroy/create pairing. Unfortunately, the delete command won't be able to properly delete the resource, since it will receive the new value of IMMUTABLE_ATTR
, rather than the stored value that it needs.
This kind of pattern is necessary for cases where I have multiple resources that depend on each other; in the case of AWS WAFv2 resources, I must first create the web-acl, and then associate it with a load balancer. The ARN of the WebACL has to be an input to the ALB-WAF association resource, so an environment variable is the only way to get that value into the dependent resource.
As I write this up, I suppose a reasonable work around would be to store any value necessary for proper deletion as state on the dependent resource; in other words, in the example above, the delete command should be reading IMMUTABLE_ATTR
from stored state, not from an environment variable. 🤔
I just released a new build that specifically was designed to make update() easier to understand and work with. Please test it out and let me know what you think. I also added a better example for update in the examples folder ("complete-with-update"). This might help answer some of your questions regarding state. I do not use environment variables for state, i use stdin which contains the information of any previous output map. I also removed ForceNew from delete() and update() because i didnt think they added any value.
If you have any problems with the latest version please open a new issue, I will close this one for now.
Another interesting issue I've run into that I can't seem to root-cause. I have a failure in my
update
command, and this is caught by Terraform duringterraform apply
. The apply fails and raises the error, but re-runningterraform apply
shows no changes pending. Inspectingterraform.tfstate
reveals that the changes were persisted to state, even though the update command failed.I think what should happen on failure is that the changes should not be persisted to state, so that the next
terraform apply
will try again to run the update command.Similarly, for the
create
command, if there is a failure in the script, the resource should be marked as tainted.Notice the
plan
(below) references a change to the environment variable:After this
apply
is complete, the terraform state file captures this change and reflects a final value of"foo bar"
for this environment variable.Any suggestions?
Here are some log dumps from a
terraform apply
attempting toupdate
a resource: