hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.69k stars 9.55k forks source link

Remove stale locks/Ability to add timeout to locks #25776

Open abergmeier opened 4 years ago

abergmeier commented 4 years ago

Terraform Version

0.12.28

Expected Behavior

There should be a mechanism to get rid of the Lock even if the Process dies suddenly.

Actual Behavior

Lock persists and nobody can use Terraform anymore. Developer needs to manually remove lock. This is fine as long CI is stable - but gets really annoying if CI is terminating Terraform very frequently.

Steps to Reproduce

Kill -9 terraform when it it is planning.

Fix to this

The most elegant IMO would be to add a (optional) timeout option to backends. This way there could be a timeout set to 2 minutes. Terraform could then touch the lock every minute to ensure it does not time out. If the Lock is present and older than 2 minutes, other Terraforms are allowed to remove it.

danieldreier commented 4 years ago

@abergmeier I think you're looking for the force-unlock command described in https://www.terraform.io/docs/commands/force-unlock.html - is that what you're looking for?

abergmeier commented 4 years ago

I think you're looking for the force-unlock command

One the contrary, I am looking for a mechanism to make force-unlock mostly unnecessary if the terraform machine is not that stable.

aweingarten commented 3 years ago

This is a very real concern on our project too. We worry about an unstable provision infrastructure acquiring a lock dying and never releasing the lock.

Since we are using terraform as part of an existing automation workflow force-unlock would represent a support escalation.

For now we are planning to write a bash script that will delete a lock that exceeds n minutes before running apply. We would like this to be a first class capability.

Nadavpe commented 2 years ago

We're facing similar issues, mostly when killing a github-action workflow, which happens every now and then.

The 2 minutes suggested above must be parametric, as some resources may have a longer response time, so it should be in the state's configuration, thus allowing admins to set the desired lock timeout.

SathishKumarRamasamy commented 2 years ago

+1 We would need this feature to avoid infinite locks when the process gets killed.