hashicorp / consul-terraform-sync

Consul Terraform Sync is a service-oriented tool for managing network infrastructure near real-time.
Mozilla Public License 2.0
121 stars 27 forks source link

Error acquiring the state lock when executing task when backend set as consul #546

Open sameer666 opened 2 years ago

sameer666 commented 2 years ago

Describe the bug

When not setting the backend manually in driver "terraform" block, it sets it as consul. When trying to execute the task, Terraform is trying to acquire the state lock but it is failing with the following error:

2021-12-16T16:01:29.834+0530 [INFO] ctrl: driver initialized 2021-12-16T16:01:29.834+0530 [INFO] ctrl: executing all tasks once through 2021-12-16T16:01:29.835+0530 [DEBUG] ctrl: watching dependencies: dependency_size=2 2021-12-16T16:01:30.057+0530 [DEBUG] driver.terraform: change detected for task: task_name=web 2021-12-16T16:01:30.081+0530 [INFO] ctrl: executing task: task_name=web 2021-12-16T16:01:35.681+0530 [ERROR] cli: error running controller in Once mode: error= could not apply changes for task web: error tf-apply for 'web': exit status 1
Error: Error loading state: failed to lock state in Consul:

On setting the backend as local, the execution works as expected.

When trying to run the module directly instead of through consul terraform sync, it works with the backend set as consul.

Versions

Consul Terraform Sync

consul-terraform-sync v0.4.2 (bc2b2a0)
Compatible with Terraform >= 0.13.0, < 1.1.0

Consul Version

Consul 1.10.4

Terraform Version

Terraform v1.0.11

Configuration File(s)

using CTS to monitor 2 services in consul and call a module to create dynamic object mapping on Cisco FMC ```hcl log_level = "DEBUG" consul { address = } driver "terraform" { version = "1.0.11" required_providers { fmc = { source = "CiscoDevNet/fmc" version = "0.2.1" } } } terraform_provider "fmc" { fmc_username = fmc_password = fmc_host = fmc_insecure_skip_verify = true } task { name = "web" description = "update policies based on node availability" source = "home/user/terraform-fmc-dynamicobject" providers = ["fmc"] services = ["web","api"] } ```

Terraform Configuration Files Generated by Consul-Terraform-Sync

Click to toggle contents of main.tf ```terraform # This file is generated by Consul Terraform Sync. # # The HCL blocks, arguments, variables, and values are derived from the # operator configuration for Sync. Any manual changes to this file # may not be preserved and could be overwritten by a subsequent update. # # Task: web # Description: update policies based on node availability terraform { required_version = ">= 0.13.0, < 1.1.0" required_providers { fmc = { source = "CiscoDevNet/fmc" version = "0.2.1" } } backend "consul" { address = gzip = true path = "consul-terraform-sync/terraform" } } provider "fmc" { fmc_host = var.fmc.fmc_host fmc_insecure_skip_verify = var.fmc.fmc_insecure_skip_verify fmc_password = var.fmc.fmc_password fmc_username = var.fmc.fmc_username } # update policies based on node availability module "web" { source = "/home/user/terraform-fmc-dynamicobject" services = var.services } ```
Click to toggle contents of terraform.tfvars ```terraform # This file is generated by Consul Terraform Sync. # # The HCL blocks, arguments, variables, and values are derived from the # operator configuration for Sync. Any manual changes to this file # may not be preserved and could be overwritten by a subsequent update. # # Task: web # Description: update policies based on node availability services = { "api.ip-1-1-1-1.dc1" = { id = "api" name = "api" kind = "" address = "1.1.1.1" port = 9090 meta = {} tags = [] namespace = "" status = "passing" node = "ip-1-1-1-1" node_id = "" node_address = "1.1.1.1" node_datacenter = "dc1" node_tagged_addresses = { lan = "1.1.1.1" lan_ipv4 = "1.1.1.1" wan = "1.1.1.1" wan_ipv4 = "1.1.1.1" } node_meta = { consul-network-segment = "" } cts_user_defined_meta = {} }, } ```

Expected Behavior

Task gets executed and dynamic objects are updated with the new mappings of IP Addresses

Actual Behavior

Task is not getting executed with the error Error acquiring the state lock when executing task

mkam commented 2 years ago

Hi @sameer666, thanks for reporting this issue! I've got a few questions to help us debug and reproduce this problem.

  1. It looks like the CTS configuration and generated Terraform configuration you've provided is the working config since it has the backend set to local. Could you update the CTS config and main.tf with the files where Consul is the backend?

  2. Is the session in the error message an empty string or have you redacted the actual value? Could you edit the removed values to be <redacted> so that we can distinguish them from empty strings?

  3. Can you give an overview of your Consul setup and would you be able to share any relevant Consul logs for the 500 error?

  4. Consul has a list of situations where the session is invalidated here. Do you think any of these scenarios could be happening while you are running CTS?

  5. Is the error happening when CTS is first started or is it happening while CTS is running after initialization has completed?

sameer666 commented 2 years ago
  1. Edited with the config that is causing the issue
  2. Edited
  3. I have EC2 instances setup in AWS running the consul agents. 2 consul servers behind load balacer, one web server running consul agent and service running on it which is registered to consul server. I am running consul terraform sync on my local system and the target device Cisco FMC is also hosted on AWS
  4. Not matching any case
  5. It is happening when CTS detects a change and task needs to be executed.
mkam commented 2 years ago

Thanks for the clarifications! Nothing is standing out to me as the root cause of your issue, and I haven't been able to reproduce it. Here are some debugging steps you could try next:

  1. Could you delete the working directory for the task and running CTS again? It should be sync-tasks/<taskname>
  2. Could you try to delete the backend in Consul KV? The command to do so is consul kv get consul-terraform-sync/terraform-env:<taskname>
  3. Could you comment out your task and configure a different test task? One example you can use is:
    task {
    name = "test-task"
    source = "mkam/hello/cts"
    providers = ["local"]
    services = ["web"]
    }