Closed LaurentLesle closed 1 month ago
Hi @LaurentLesle,
I don't have the infrastructure to test the az
command you have here right now, so we may need to find a different way to reproduce and debug the problem.
The problem likely comes from the fact that while you are killing the az
process, you are not actually killing the process that Terraform is executing and that is not returning for some reason, probably because there's another blocked child process involved. Adding to the possible layers here, you are using bash -c
inside of another shell command, so Terraform is actually running /bin/sh -c "sudo bash -c \"\n az network...
. So you have sh
running sudo
running bash
running az
and so on.
Chances are that there's nothing Terraform can do on its own here, if the process that Terraform executed doesn't return, it has no way to determine what might be blocking it or what action it could take. You could try setting some of the usual shell failsafe options, like set -eo pipefail
to try and get the shell to exit more cleanly, or add -x
to see all shell commands in the output. I would also remove the bash -c
from the sudo
command, trying to minimize the layers of processes involved in the process execution.
It also bears mentioning, provisioners are not being actively supported at the moment, per https://github.com/hashicorp/terraform/blob/main/.github/CONTRIBUTING.md#provisioners. It may be worth bringing this to the community forum where there are more people ready to help. The GitHub issues here are monitored only by a few core maintainers. Thanks!
I did additional tests and realised I was killing the az cli process leaving the child process running the tunnel (python program) orphan. I identified that as the tcp port was not released. Killing the child process fixed the issue. Thanks for taking the time to share some input.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Terraform Version
Terraform Configuration Files
Debug Output
When you see the WARNING from the output saying the tunnel is ready you can execute the pgrep/kill to reproduce it
Then you see the Terminated: 15 in the terraform logs. local-exec ends and we see the resource creation in an infinite loop
NOTE: I cannot reproduce with a sleep. tried also with "python3 -m http.server 8080". Looks like the az network bastion tunnel command behave differently.
Expected Behavior
When the command executed by the local-exec terminates it is expected to be considered as completed (at least with a failure). In my case the process is killed with the kill command
terraform_data.open_ssh_tunnels -> Use a local-exec to run a command to open the ssh tunnel through Azure Bastion and wait until killed terraform_data.qa_installation -> Use a connection to the local endpoint to the tunnel to connect to the target vmss instance and execute a command use remote-exec terraform_data.close_ssh_tunnels -> Retrieve the command PID and kill it nicely to allow az cli to close the tunnel
(picture from my local patch showing the solution working)
Actual Behavior
The behaviour we see if the local-exec terminates but the resource calling the local-exec (terraform_data.open_ssh_tunnels) goes into an infinite loop to create the resource and the only way to terminate the terraform execution with Ctrl+c
Steps to Reproduce
The scenario to reproduce it is in the context of Azure Bastion Host and a VM scaleset. The goal is to connect to the VMSS instances (who have not public ip addresses) through the Azure bastion host using the provisioner connection block.
I am using the following workflow: 1 - Create a SSH tunnel with "az network bastion tunnel" command who then acts as a local SSH tunnel endpoint on 127.0.0.1 and the target VMSS instance 2 - Connect to the local tunnel through the local port and execute the command in the remote server using remote-exec 3 - Close the tunnel. This is where it is a bit tricky as the only way to close the tunnel is to send a CTRL+C to the az network bastion tunnel command. I decided to use a kill -15 which is closing the process and the tunnel as excepted.
Additional Context
I have already a patch for it I will submit but looks like the issue is coming from https://github.com/hashicorp/terraform/blob/70fcc63c3643317ecc2d6b6a0485d47e7e8a55ea/internal/builtin/provisioners/local-exec/resource_provisioner.go#L182 and https://github.com/hashicorp/terraform/blob/70fcc63c3643317ecc2d6b6a0485d47e7e8a55ea/internal/builtin/provisioners/local-exec/resource_provisioner.go#L192
References
https://learn.microsoft.com/en-us/azure/bastion/connect-vm-native-client-linux#tunnel https://pkg.go.dev/os/exec#Cmd.Run