hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io
Other
43.16k stars 9.58k forks source link

Terraform version 1.10.0 fails to release lock with the error : unable to unlock workspace while state version upload is still pending #36155

Open kush-openai opened 21 hours ago

kush-openai commented 21 hours ago

Terraform Version

Terraform v1.10.0
on darwin_amd64

Terraform Configuration Files

terraform {
  required_version = "1.10.0"
  backend "remote" {
    hostname     = "app.terraform.io"
    organization = "xxxxxx"

    workspaces {
      name = "ankush-test-0"
    }
  }
  required_providers {
    azuread = "~> 2.23.0"
    azurerm = "~> 3.109.0"
  }
}

# PROVIDER SETUP

provider "azurerm" {
  subscription_id = "xxxxx"
  features {}
}

Debug Output

Error: Error releasing the state lock

Error message: unable to unlock workspace while state version upload is
still pending
Lock Info:
  ID:        xxxxxxxxxxxxxxxxxx
  Path:      
  Operation: OperationTypeApply
  Who:       root@github-runner-xxx
  Version:   1.10.0
  Created:   2024-12-03 03:08:52.868010616 +0000 UTC
  Info:      

Expected Behavior

Release lock operation should be retries till the state version upload is complete

Actual Behavior

Error releasing the state lock

Steps to Reproduce

We were able to reproduce it consistently with

terraform plan-lock=false -out plan.out

terraform apply plan.out

This was also happening when plan was empty

Additional Context

We believe this is happening because of this change in 1.10+. This is from the release log of terraform enterprise version https://developer.hashicorp.com/terraform/enterprise/releases/2024/v202410-1

""" Workspaces API unlock action will now return a 400 status instead of 503 when the latest state version is still pending, but only for Terraform CLI 1.10+ clients. """

Because a 400 is returned, the terraform client does not do any retry and fails immediately. This is the code where retry logic is configured : https://github.com/hashicorp/go-tfe/blob/f9d78881328030c3949b5ca1b0ff72465a74e0c0/tfe.go#L605. It only retries on 500+ error codes.

References

No response

kush-openai commented 21 hours ago

We downgraded to 1.9.8 safely to mitigate this issue and this went away.

kush-openai commented 21 hours ago

This might be an issue with terraform enterprise backend and the retry logic implementation in go-tfe : https://github.com/hashicorp/go-tfe/issues/1015