hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.66k stars 9.55k forks source link

Terraform does not handle local-exec command that did not terminate properly #35657

Closed LaurentLesle closed 1 month ago

LaurentLesle commented 2 months ago

Terraform Version

1.9.5

Terraform Configuration Files

locals {
  command     = <<-EOT
    sudo bash -c "
      az network bastion tunnel \
      --name snap-vueg \
      --resource-group rg-vueg \
      --target-resource-id /subscriptions/000000000000000/resourceGroups/rg-vueg/providers/Microsoft.Compute/virtualMachineScaleSets/vmss-vueg/virtualMachines/2 \
      --resource-port 22 --port 2230

      echo "Leaving batch."
    "
    EOT
}
resource "terraform_data" "open_ssh_tunnels" {
  triggers_replace = local.command

  provisioner "local-exec" {
    quiet       = true
    interpreter = ["/bin/sh", "-c"]
    command     = local.command
  }
}

Debug Output

sudo terraform apply
Password:
terraform_data.open_ssh_tunnels: Refreshing state... [id=27baf4dc-2b54-3c3d-3564-72d45accc8a5]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # terraform_data.open_ssh_tunnels is tainted, so must be replaced
-/+ resource "terraform_data" "open_ssh_tunnels" {
      ~ id               = "27baf4dc-2b54-3c3d-3564-72d45accc8a5" -> (known after apply)
        # (1 unchanged attribute hidden)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

terraform_data.open_ssh_tunnels: Destroying... [id=27baf4dc-2b54-3c3d-3564-72d45accc8a5]
terraform_data.open_ssh_tunnels: Destruction complete after 0s
terraform_data.open_ssh_tunnels: Creating...
terraform_data.open_ssh_tunnels: Provisioning with 'local-exec'...
terraform_data.open_ssh_tunnels (local-exec): local-exec: Executing: Suppressed by quiet=true
terraform_data.open_ssh_tunnels (local-exec): WARNING: Opening tunnel on port: 2230
terraform_data.open_ssh_tunnels (local-exec): WARNING: Tunnel is ready, connect on port 2230
terraform_data.open_ssh_tunnels (local-exec): WARNING: Ctrl + C to close
terraform_data.open_ssh_tunnels: Still creating... [10s elapsed]
terraform_data.open_ssh_tunnels (local-exec): Terminated: 15
terraform_data.open_ssh_tunnels (local-exec): Leaving
terraform_data.open_ssh_tunnels: Still creating... [20s elapsed]
terraform_data.open_ssh_tunnels: Still creating... [30s elapsed]
^C
Interrupt received.
Please wait for Terraform to exit or data loss may occur.
Gracefully shutting down...

Stopping operation...
╷
│ Error: execution halted
│ 
│ 
╵
╷
│ Error: execution halted
│ 
│ 
╵
╷
│ Error: execution halted

When you see the WARNING from the output saying the tunnel is ready you can execute the pgrep/kill to reproduce it

pgrep -f '/bin/az network.*--port 2230$' | sudo xargs -r kill -15

Then you see the Terminated: 15 in the terraform logs. local-exec ends and we see the resource creation in an infinite loop

NOTE: I cannot reproduce with a sleep. tried also with "python3 -m http.server 8080". Looks like the az network bastion tunnel command behave differently.

Expected Behavior

When the command executed by the local-exec terminates it is expected to be considered as completed (at least with a failure). In my case the process is killed with the kill command

terraform_data.open_ssh_tunnels -> Use a local-exec to run a command to open the ssh tunnel through Azure Bastion and wait until killed terraform_data.qa_installation -> Use a connection to the local endpoint to the tunnel to connect to the target vmss instance and execute a command use remote-exec terraform_data.close_ssh_tunnels -> Retrieve the command PID and kill it nicely to allow az cli to close the tunnel

(picture from my local patch showing the solution working) image

Actual Behavior

The behaviour we see if the local-exec terminates but the resource calling the local-exec (terraform_data.open_ssh_tunnels) goes into an infinite loop to create the resource and the only way to terminate the terraform execution with Ctrl+c

image

Steps to Reproduce

The scenario to reproduce it is in the context of Azure Bastion Host and a VM scaleset. The goal is to connect to the VMSS instances (who have not public ip addresses) through the Azure bastion host using the provisioner connection block.

I am using the following workflow: 1 - Create a SSH tunnel with "az network bastion tunnel" command who then acts as a local SSH tunnel endpoint on 127.0.0.1 and the target VMSS instance 2 - Connect to the local tunnel through the local port and execute the command in the remote server using remote-exec 3 - Close the tunnel. This is where it is a bit tricky as the only way to close the tunnel is to send a CTRL+C to the az network bastion tunnel command. I decided to use a kill -15 which is closing the process and the tunnel as excepted.

Additional Context

I have already a patch for it I will submit but looks like the issue is coming from https://github.com/hashicorp/terraform/blob/70fcc63c3643317ecc2d6b6a0485d47e7e8a55ea/internal/builtin/provisioners/local-exec/resource_provisioner.go#L182 and https://github.com/hashicorp/terraform/blob/70fcc63c3643317ecc2d6b6a0485d47e7e8a55ea/internal/builtin/provisioners/local-exec/resource_provisioner.go#L192

References

https://learn.microsoft.com/en-us/azure/bastion/connect-vm-native-client-linux#tunnel https://pkg.go.dev/os/exec#Cmd.Run

jbardin commented 2 months ago

Hi @LaurentLesle,

I don't have the infrastructure to test the az command you have here right now, so we may need to find a different way to reproduce and debug the problem.

The problem likely comes from the fact that while you are killing the az process, you are not actually killing the process that Terraform is executing and that is not returning for some reason, probably because there's another blocked child process involved. Adding to the possible layers here, you are using bash -c inside of another shell command, so Terraform is actually running /bin/sh -c "sudo bash -c \"\n az network.... So you have sh running sudo running bash running az and so on.

Chances are that there's nothing Terraform can do on its own here, if the process that Terraform executed doesn't return, it has no way to determine what might be blocking it or what action it could take. You could try setting some of the usual shell failsafe options, like set -eo pipefail to try and get the shell to exit more cleanly, or add -x to see all shell commands in the output. I would also remove the bash -c from the sudo command, trying to minimize the layers of processes involved in the process execution.

crw commented 2 months ago

It also bears mentioning, provisioners are not being actively supported at the moment, per https://github.com/hashicorp/terraform/blob/main/.github/CONTRIBUTING.md#provisioners. It may be worth bringing this to the community forum where there are more people ready to help. The GitHub issues here are monitored only by a few core maintainers. Thanks!

LaurentLesle commented 1 month ago

I did additional tests and realised I was killing the az cli process leaving the child process running the tunnel (python program) orphan. I identified that as the tcp port was not released. Killing the child process fixed the issue. Thanks for taking the time to share some input.

github-actions[bot] commented 3 weeks ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.