Open lud0v1c opened 2 years ago
Hi @lud0v1c,
Thanks for filing the issue. Even in the absence of a state file, a backend must still enforce a lock to prevent multiple instances of Terraform from writing a new state concurrently. This may be implemented in different ways in different backends due to different storage constraints and mocking mechanisms, so the failure modes will be slightly different between each.
Since the process was killed without releasing the lock, and that backend has a persistent locking mechanism, the lock will have to be manually released. You should be able to do that by passing the lock id into the force-unlock
command:
terraform force-unlock 76a2bab0-12b1-5b0e-395e-46177a0fe849
If that doesn't work please let us know and we can mark this as an issue with the k8s backend.
Thanks!
Hey @jbardin, thank you for the reply. As I've stated in the Actual Behaviour, force-unlock somehow doesn't work:
PS C:\Users\lud0v1c\orion-cluster> terraform force-unlock 76a2bab0-12b1-5b0e-395e-46177a0fe849
2021-12-13T15:38:47.907Z [INFO] Terraform version: 1.1.0
2021-12-13T15:38:47.907Z [INFO] Go runtime version: go1.17.2
2021-12-13T15:38:47.908Z [INFO] CLI args: []string{"C:\\ProgramData\\chocolatey\\lib\\terraform\\tools\\terraform.exe", "force-unlock",
"76a2bab0-12b1-5b0e-395e-46177a0fe849"}
2021-12-13T15:38:47.908Z [TRACE] Stdout is a terminal of width 137
2021-12-13T15:38:47.908Z [TRACE] Stderr is a terminal of width 137
2021-12-13T15:38:47.908Z [TRACE] Stdin is a terminal
2021-12-13T15:38:47.911Z [DEBUG] Attempting to open CLI config file: C:\Users\lud0v1c\AppData\Roaming\terraform.rc
2021-12-13T15:38:47.911Z [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2021-12-13T15:38:47.911Z [DEBUG] ignoring non-existing provider search directory terraform.d/plugins
2021-12-13T15:38:47.911Z [DEBUG] ignoring non-existing provider search directory C:\Users\lud0v1c\AppData\Roaming\terraform.d\plugins
2021-12-13T15:38:47.912Z [DEBUG] ignoring non-existing provider search directory C:\Users\lud0v1c\AppData\Roaming\HashiCorp\Terraform\plugins
2021-12-13T15:38:47.913Z [INFO] CLI command args: []string{"force-unlock", "76a2bab0-12b1-5b0e-395e-46177a0fe849"}
2021-12-13T15:38:47.914Z [TRACE] Meta.Backend: built configuration for "kubernetes" backend with hash value 2627546192
2021-12-13T15:38:47.915Z [TRACE] Preserving existing state lineage "5c443a4e-f465-eea1-9f23-69cedf912e70"
2021-12-13T15:38:47.915Z [TRACE] Preserving existing state lineage "5c443a4e-f465-eea1-9f23-69cedf912e70"
2021-12-13T15:38:47.915Z [TRACE] Meta.Backend: working directory was previously initialized for "kubernetes" backend
2021-12-13T15:38:47.916Z [TRACE] Meta.Backend: using already-initialized, unchanged "kubernetes" backend configuration
2021-12-13T15:38:47.916Z [DEBUG] Using kubeconfig: C:\Users\lud0v1c\.kube\orion
2021-12-13T15:38:47.917Z [INFO] Successfully initialized config
2021-12-13T15:38:47.918Z [TRACE] Meta.Backend: instantiated backend of type *kubernetes.Backend
2021-12-13T15:38:47.918Z [DEBUG] checking for provisioner in "."
2021-12-13T15:38:47.918Z [DEBUG] checking for provisioner in "C:\\ProgramData\\chocolatey\\lib\\terraform\\tools"
2021-12-13T15:38:47.919Z [TRACE] Meta.Backend: backend *kubernetes.Backend does not support operations, so wrapping it in a local backend
Failed to load state: the state is already locked by another terraform client
Lock Info:
ID: 76a2bab0-12b1-5b0e-395e-46177a0fe849
Path:
Operation: OperationTypeApply
Who: ZEUS\lud0v1c@zeus
Version: 1.1.0
Created: 2021-12-11 23:03:32.907186 +0000 UTC
Info:
I've also tried terraform state pull
to see if I could get a hint or a more descriptive error message on where the lock was defined/store, but nothing..
The backend k8s cluster also doesn't report anything out of usual (no state there as I've mentioned).
Thanks @lud0v1c, that looks like a bug in the kubernetes backend implementation, somehow preventing any access even when only deleting the lock. The kubernetes lock is implemented via a lease, which is separate from the state object. I'm not sure offhand what the required commands are, but there is probably a way to list and delete existing leases from the kubectl
command directly.
@jbardin Thank you for the hint! I've read about using Leases in the k8s backend documentation, but haven't really interacted with them before.
After performing a kubectl delete lease lock-tfstate-default-tfstate
, I was able to init
and continue as normal.
This is what I had listed in the default namespace:
NAME HOLDER AGE
lock-tfstate-default-tfstate 2d
lock-tfstate-default-state 76a2bab0-12b1-5b0e-395e-46177a0fe849 47h
I'm not sure if I should close the issue in case you guys want to investigate more, so I'll leave it at your discretion 😃
Thanks for the info @lud0v1c! That's helpful if anyone else encounters this.
I'll leave the issue open, since the terraform force-unlock
command should have been able to complete the same procedure.
Terraform Version
Terraform Configuration Files
Debug Output
https://gist.github.com/lud0v1c/5e655d1a4fae07c69a217665435b56d2
Expected Behavior
There's no state in the backend so it should create a new one.
Actual Behavior
Every operation fails due to another apparent terraform client doing an operation, like the debug output shows. This doesn't allow me to do anything, not init/plan/apply. force-unlock/state rm/pull don't work either.
Steps to Reproduce
terraform init
kubectl delete secret tfstate-default-state
.terraform init
or-migrate-state
Additional Context
I setup a k3s cluster some days ago, and yesterday I switched from local state storage to storing the state on the cluster itself. \ Deployed the backend and terraform_remote_state without any problem. Everything was OK until an operation I was performing got killed due to network issues (an apply on my desktop Windows PC). \ Knowing what this does and since no changes were made, I deleted the tfstate secret in the cluster. I can confirm there are no tfstate secrets in any namespace whatsoever. \ Looking this up online, people mentioned that it could be another client/process but I looked at all local processes, and even tried initializing on my laptop (with the original failed operation computer shut down) but that also fails, always with the same error message. \ I really can't understand from where this state/operation is being fetched, even rebooting my k8s nodes did nothing!