Open pedroslopez opened 2 years ago
Hmmm I would've thought that the Terraform SDK handles that timeout
block for me since we're just using the context provided directly from the SDK. I'll have to do some digging!
Oh! So this is interesting. After seeing these errors popup we added the timeout
block as shown in the sample config file above, but applying those changes didn't actually set up the timeouts on the state file. The new runs created after we set up the timeout do have them in the state file. I'm assuming the timeouts need to be in the state file so when they're destroyed the right value is used.
The same can be reproduced in a simple usage of the multispace_run
resource - setting a timeout after creation or updating the timeout value has no effect (and terraform plan
shows no changes). I'm not sure if this is specific to this provider or something on a deeper level, though.
I do see that on resource update here https://github.com/mitchellh/terraform-provider-multispace/blob/main/internal/provider/resource_run.go#L106-L109 nil
is simply returned and nothing else is done. Maybe something needs to be done here to update timeouts properly?
I don't know either. I think at least partially, this might be worth asking the Terraform core GitHub as well. I'll do some research here too but it might be useful to have the two threads going in case there is any core (or core SDK) issue.
When a run is enqueued for a long time due to the available workers being tied up, the multispace run errors with
context deadline exceeded
. I've noticed this specifically indestroy
runs. A customtimeout
has been set, but it doesn't seem to have effect fordestroy
runs (the same issue oncreate
happens, but after the configured timeout as expected)Terraform Version
Terraform 1.0.8, 1.0.9 multispace 0.1.0
Affected Resource(s)
Please list the resources as a list, for example:
If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.
Terraform Configuration Files
Debug Output
Please provider a link to a GitHub Gist containing the complete debug output: https://gist.github.com/pedroslopez/fffcbb4f1786246ddea8d84dacfebac5
Gist from a different workspace where I was able to reproduce the issue.
Expected Behavior
What should have happened?
On destroy, the
mutlispace_run
should have waited up to the configured destroy timeout while the related run was still queued, or ideally it should keep waiting as long as the run is still queued.Actual Behavior
What actually happened?
After 15 minutes, the run failed with
context deadline exceeded
. The run triggered bymultispace_run
eventually ran once the workers became available, but by then the deadline error had already happened.Steps to Reproduce
This can easily be reproduced in a free terraform cloud organization where there are not enough workers to process the triggered run. Just have the multispace_run trigger a
destroy
run and see that it only waits up to 15 minutes, failing withcontext deadline exceeded
.Important Factoids
Are there anything atypical about your accounts that we should know? For example: Running in EC2 Classic? Custom version of OpenStack? Tight ACLs?
Pretty standard Terraform Cloud for Business organization, but we only have 3 workers so if multiple workspaces are being destroyed that take a long time to clean up the resources we run into this issue.