Open steeling opened 4 years ago
I'm not sure if terraform supports locking by resource, or just locks the entire state file. If currently doing the latter, even adding a lock command for the entire file would be helpful.
Additionally, adding a lock command could help if I want to do multiple commands transactionally. ie: from https://github.com/hashicorp/terraform/issues/26423
tf lock
tf check-for-diff -lock=false
tf apply -lock=false
tf force-unlock
Hi @steeling! Thanks for this enhancement request.
Terraform does indeed currently model locking as a whole-workspace idea (usually implemented by locking the object that's storing the state, as you mentioned). Some of the locking implementations are also unable to hold a lock without keeping a terraform
process running to hold it, and so that's why Terraform doesn't currently have a command to just create a lock without its lifecycle being connected to some other operation.
A possible compromise here could be a command that takes the lock and then blocks at the terminal until it is interrupted by something like Ctrl+C, so you can therefore hold a Terraform lock even though Terraform isn't currently actually doing anything, but the terraform
process still exists to hold it.
I think you could emulate this today by making a throwaway change to your configuration, running terraform apply
, and then leaving Terraform waiting for confirmation while you do something else; Terraform holds the lock while it awaits approval for the plan, so you can in principle use it as a weird way to grab a lock and then eventually just say "no" at the confirmation prompt to release the lock without changing anything.
With all of that said, it would of course not help very much with the "running multiple Terraform commands transactionally" idea because in that case you explicitly want the lock to outlive a particular terraform
process, and for those other commands to somehow pick up the same lock rather than trying to create a new one (which would otherwise deadlock).
Hey @apparentlymart, thanks for the detailed response! Would you mind explaining how a separate process determines how the lock is currently being held? Is it implementation specific depending on where the state is stored (ie: 1 impl for azure blob store, and another for GCS, or something more generic?)
Ya I don't think the running process would meet our needs unfortunately. Also instead of passing the lock from one process to another, I think we could model it like code, where I grab the lock, the other terraform actions do things without the lock (or even without knowledge that the lock is held, ie: supply the -lock=false
flag. ie: consider the following golang psuedo code:
var mu sync.Mutex
mu.Lock()
defer mu.Unlock()
diffs, err := terraform.Reconcile(lock=false) # doesn't know lock is held
if err != nil {
return err
}
if !diffs {
terraform.Apply(lock=false)
}
return
Here's a thought on how this could be accomplished given the current locking mechanisms:
Every command that currently grabs the lock would do the following. Supplying -lock=false would skip* steps 1 & 2:
lock_status
to determine if it is locked asyncrhonouslyThe lock/unlock command would be a special case:
lock_status
to determine if it is locked asyncrhonously*Note: on skipping steps 1 & 2, it might make more sense to skip just 2.. I find it hard to imagine a scenario where one would want commands to race with each other, although maybe I'm just not thinking hard enough :)
Eventually lock_status could also be moved to each individual terraform resource
Thanks in advance for entertaining this discussion!
looking at some of the implemenations I can answer my own question above on the locking mechanism being implementation specific. Following up on that, the above pseudo code is only necessary for those specific implementations, while the rest (majority?) can just grab the lock and return.
@apparentlymart, looking into this more, it seems like terraform is doing something more complicated than simply grabbing the resource lock, ie: on an azurerm backend, if I grab the blob lease, and do tf plan -lock=false
, I get:
Error: Error loading state: failed to lock azure state: 2 errors occurred:
This seems like a pretty basic feature to ensure transactionality between multiple requests, and allowing a simple mechanism for oncall ops to prevent automation from rolling forward.
Hi @steeling,
The backends all have pretty different implementations of the locking interfaces with different requirements and tradeoffs, and all of them have been through many iterations to get their behavior right against the quirks of each service, so unfortunately I don't think we can consider any change to the locking model to be a "basic feature". That doesn't mean it isn't a valid feature request, but it does mean it will require a considerable design effort and is something we're unlikely to tackle in the near future due to our focus being elsewhere.
Hi @apparentlymart, thanks for the reply! That's very reasonable :)
Submitted https://github.com/hashicorp/terraform/pull/26572 to see if I can poke around in this space.
Also submitted https://github.com/hashicorp/terraform/pull/26561 to fix azure force-unlock
, which doesn't work in non-default workspaces
Hi there,
Apologies if this is already possible, but I don't see a command listed here
I'd like to propose adding a
command so that a user can lock a resource to prevent both automation and other users from making changes during outages. This is a principal taken from lock-out/tag-out used in industrial equipment maintenance, and applied to software maintenance.