Open oprudkyi opened 2 years ago
Hi @oprudkyi! Thanks for sharing this bug report.
I just want to confirm that I'm understanding what you were expecting, and what you actually observed.
You used -lock-timeout=30m
here, so I assume you were intending for Terraform to keep retrying to obtain the lock for up to 30 minutes if it is already held.
But I think you are saying that sometimes (with no discernible pattern) Terraform just fails immediately with this error, without waiting for the 30 minute timeout.
Is that a correct understanding of what you reported here? Thanks!
Hi @apparentlymart , yes, you are right. instead of waiting 30minutes it crashes in 30 seconds
Hi,
are there any updates on this? I have run into the same issue.
I am guessing it is a race condition due to GCS eventual consistency?
Hi, we're facing this issue a lot lately where terraform does not respect the lock-timeout and fails instantly with the message in OP's Actual Behavior section.
Anyone has found workaround or steps that could be implemented to alleviate the issue ?
Does anyone have a solution? This is a problem with GCP still.
@PhillyWebGuy I rerun ci/cd manually :(
If I try to run manually/locally. When I run terraform init
I get this:
Initializing the backend...
╷
│ Error: storage.NewClient() failed: dialing: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
So I change my backend.tf to look like this:
terraform {
backend "gcs" {
bucket = "bucket-xxx-tfstate"
prefix = "terraform/state"
credentials = "my-credentials-file.json" #<-- Added this
}
}
Once I add that credentials value, then it works. But that doesn't really solve the automated Github Actions solution I'd like to employ. The terraform init
command does not fail. And it actually creates a default.tfstate
file but not the default.tflock
file.
I don't know if this is the problem other people are having, but to summarize:
terraform init
seems to work with GCP/Github Actions since the error only throws when trying to execute locallydefault.tfstate
gets written -- but not default.tflock
. My Github Actions workflow.yaml file:
name: 'Terraform CI'
on:
push:
branches:
- develop
pull_request:
jobs:
terraform:
name: 'Terraform'
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Setup Terraform
uses: hashicorp/setup-terraform@v1
- name: Terraform Init
run: terraform init
env:
GOOGLE_CREDENTIALS: ${{secrets.GOOGLE_CREDENTIALS}}
- name: Terraform Format
run: terraform fmt -check
- name: Terraform Plan
run: terraform plan
env:
GOOGLE_CREDENTIALS: ${{secrets.GOOGLE_CREDENTIALS}}
- name: Terraform Apply
run: terraform apply -auto-approve -lock-timeout=5m
env:
GOOGLE_CREDENTIALS: ${{secrets.GOOGLE_CREDENTIALS}}
My main.tf
resource "google_storage_bucket" "xxxx_0_logs" {
name = "xxx-0-logs"
force_destroy = true
location = "US"
storage_class = "STANDARD"
versioning {
enabled = true
}
}
Just to remind, we are experiencing this error daily - 3-5 times per each run of 20-40 concurrent terraform processes Our pipeline looks something like this
terraform_remote_state
)terraform apply -auto-approve -lock-timeout=30m -no-color
I have the same issue. Please note that the bucket was empty before the run.
Error loading state: 2 errors occurred:
* writing "gs://XXXXXXXXXXX/default.tflock" failed: googleapi: Error 412: At least one of the pre-conditions you specified did not hold., conditionNotMet
* storage: object doesn't exist
Wouldn't the fix be to store the lockfile at the prefix path. This way multiple state files that all live in the same bucket dont force using the same lockfile?
@duxbuse no. it would imply disabling locking at all, with dare consequences
Please note that I run this setup almost every day. And I have this error sometimes.
Env: terragrunt version 0.35.10 terraform version 1.1.3
Output:
Group 1
Group 2
=============== cut ===============
╷ │ Error: Error acquiring the state lock │ │ Error message: 2 errors occurred: │ writing │ "gs://buket_name/bucket_prefix/dir1/terraform1/default.tflock" │ failed: googleapi: Error 412: At least one of the pre-conditions you │ specified did not hold., conditionNotMet │ storage: object doesn't exist │ │ │ │ Terraform acquires a state lock to protect the state from being written │ by multiple users at the same time. Please resolve the issue above and try │ again. For most commands, you can disable locking with the "-lock=false" │ flag, but this is not recommended.
================= cut ========================
Terraform has been successfully initialized!
╷
│ Error: Error acquiring the state lock
│
│ Error message: writing
│ "gs://buket_name/bucket_prefix/dir1/terraform1/default.tflock"
│ │ failed: googleapi: Error 412: At least one of the pre-conditions you
│ specified did not hold., conditionNotMet
│ Lock Info:
│ ID: XXXXXXXXXXXXX
│ Path: gs://buket_name/bucket_prefix/dir1/dir2/default.tflock
│ Operation: OperationTypePlan
│ Who: my_server_name
│ Version: 1.1.3
│ Created: 2023-07-03 20:15:46.496329978 +0000 UTC
│ Info:
│
│
│ Terraform acquires a state lock to protect the state from being written
│ by multiple users at the same time. Please resolve the issue above and try
│ again. For most commands, you can disable locking with the "-lock=false"
│ flag, but this is not recommended.
╵
Fixed in other tool. closing it now as irrelevant
Could you link to where or how this was fixed? We did run in the same issue today
With terraform this still happens to us. Maybe we can reopen the issue here.
With apologies to @oprudkyi, I agree it would make sense to leave the issue open here. It is possible to re-report it as a new issue, but there is enough history in this issue to make it more desirable to simply keep this issue open until the GCS team works on it. Thanks!
in cicd scenarios sometimes the same lock is obtained concurrently by few processes, randomly it fails restarting failed process fixes error
Terraform Version
Terraform Configuration Files
Expected Behavior
Lock obtained
Actual Behavior
Steps to Reproduce
terraform apply -auto-approve -lock-timeout=30m -no-color
Additional Context
up to 20 processes may run apply with the same lock file/gcs backend