hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io
Other
42.85k stars 9.56k forks source link

Unable to launch powershell script as a background process on EC2 instance in AWS using terraform "remote-exec" provisioner #34384

Open papaneeds opened 11 months ago

papaneeds commented 11 months ago

Terraform Version

Terraform v1.6.3
on windows_amd64
+ provider registry.terraform.io/hashicorp/aws v5.9.0

Terraform Configuration Files

resource "aws_instance" "training_node" {
  instance_type          = "t3.small"
  ami = "ami-abcdefgxxxxx"
  key_name               = "trainingNode-keyPair"
  vpc_security_group_ids = [aws_security_group.training_sg.id]
  subnet_id              = aws_subnet.training_public_subnet.id
  source_dest_check      = false

  root_block_device {
    volume_size = 30
  }

  provisioner "file" {
    source = "./looper1.ps1"
    destination = "C:/Users/Administrator/looper1.ps1"
  }

  provisioner "file" {
    source = "./looper2.ps1"
    destination = "C:/Users/Administrator/looper2.ps1"
  }

  # Unblock all files so that you can execute them
  provisioner "remote-exec" {
    inline = [
      "powershell \"Get-Childitem C:\\Users\\Administrator -Recurse | Unblock-File\""
    ]
  }

  provisioner "remote-exec" {
    inline = [
      "start /b \"looper1\" \"powershell.exe\" \"C:\\Users\\Administrator\\looper1.ps1\"",
    ]
  }

  provisioner "remote-exec" {
    inline = [
      "powershell.exe Start-Process -FilePath \"powershell\" -ArgumentList C:\\Users\\Administrator\\looper2.ps1",
    ]
  }

  connection {
    type = "ssh"
    port = "22"
    target_platform = "windows"
    user = "Administrator"
    host = self.public_ip
    timeout = "5m"
    private_key = file("trainingNode-keyPair.pem")
  }

  tags = {
    Name = "training-node"
  }
}

Debug Output

aws_instance.training_node: Still creating... [3m30s elapsed]
2023-12-08T14:05:57.312-0500 [TRACE] dag/walk: vertex "root" is waiting for "provider[\"registry.terraform.io/hashicorp/aws\"] (close)"
2023-12-08T14:05:57.385-0500 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/aws\"] (close)" is waiting for "aws_instance.training_node"
2023-12-08T14:05:58.272-0500 [DEBUG] Starting remote scp process:  "scp" -vt C:/Users/Administrator
2023-12-08T14:05:58.330-0500 [DEBUG] Started SCP session, beginning transfers...
2023-12-08T14:05:58.331-0500 [DEBUG] Beginning file upload...
2023-12-08T14:05:58.517-0500 [DEBUG] SCP session complete, closing stdin pipe.
2023-12-08T14:05:58.517-0500 [DEBUG] Waiting for SSH session to complete.
2023-12-08T14:05:58.549-0500 [ERROR] scp stderr: "Sink: C0644 219 looper1.ps1\r\n"
2023-12-08T14:05:58.549-0500 [TRACE] applyProvisioners: provisioning aws_instance.training_node with "file"
2023-12-08T14:05:58.549-0500 [TRACE] terraform.contextPlugins: Initializing provisioner "file" to read its schema        
2023-12-08T14:05:58.557-0500 [TRACE] terraform.contextPlugins: Initializing provider "registry.terraform.io/hashicorp/aws" to read its schema
2023-12-08T14:05:58.557-0500 [TRACE] terraform.contextPlugins: Initializing provider "registry.terraform.io/hashicorp/aws" to read its schema
aws_instance.training_node: Provisioning with 'file'...
2023-12-08T14:05:58.559-0500 [INFO]  using private key for authentication
2023-12-08T14:05:58.559-0500 [DEBUG] Connecting to 3.99.142.220:22 for SSH
2023-12-08T14:05:58.579-0500 [DEBUG] Connection established. Handshaking for user Administrator
2023-12-08T14:05:58.869-0500 [DEBUG] starting ssh KeepAlives
2023-12-08T14:05:58.891-0500 [DEBUG] opening new ssh session
2023-12-08T14:05:59.001-0500 [DEBUG] Starting remote scp process:  "scp" -vt C:/Users/Administrator
2023-12-08T14:05:59.102-0500 [DEBUG] Started SCP session, beginning transfers...
2023-12-08T14:05:59.103-0500 [DEBUG] Beginning file upload...
2023-12-08T14:05:59.205-0500 [DEBUG] SCP session complete, closing stdin pipe.
2023-12-08T14:05:59.206-0500 [DEBUG] Waiting for SSH session to complete.
2023-12-08T14:05:59.233-0500 [ERROR] scp stderr: "Sink: C0644 219 looper2.ps1\r\n"
2023-12-08T14:05:59.235-0500 [TRACE] applyProvisioners: provisioning aws_instance.training_node with "remote-exec"       
2023-12-08T14:05:59.235-0500 [TRACE] terraform.contextPlugins: Initializing provisioner "remote-exec" to read its schema 
2023-12-08T14:05:59.236-0500 [TRACE] terraform.contextPlugins: Initializing provider "registry.terraform.io/hashicorp/aws" to read its schema
2023-12-08T14:05:59.237-0500 [TRACE] terraform.contextPlugins: Initializing provider "registry.terraform.io/hashicorp/aws" to read its schema
aws_instance.training_node: Provisioning with 'remote-exec'...
2023-12-08T14:05:59.241-0500 [INFO]  using private key for authentication
aws_instance.training_node (remote-exec): Connecting to remote host via SSH...
aws_instance.training_node (remote-exec):   Host: 3.99.142.220
aws_instance.training_node (remote-exec):   User: Administrator
aws_instance.training_node (remote-exec):   Password: false
aws_instance.training_node (remote-exec):   Private key: true
aws_instance.training_node (remote-exec):   Certificate: false
aws_instance.training_node (remote-exec):   SSH Agent: false
aws_instance.training_node (remote-exec):   Checking Host Key: false
aws_instance.training_node (remote-exec):   Target Platform: windows
2023-12-08T14:05:59.253-0500 [DEBUG] Connecting to 3.99.142.220:22 for SSH
2023-12-08T14:05:59.275-0500 [DEBUG] Connection established. Handshaking for user Administrator
aws_instance.training_node (remote-exec): Connected!
2023-12-08T14:05:59.526-0500 [DEBUG] starting ssh KeepAlives
2023-12-08T14:05:59.533-0500 [DEBUG] opening new ssh session
2023-12-08T14:05:59.633-0500 [DEBUG] Starting remote scp process:  "scp" -vt C:/windows/temp
2023-12-08T14:05:59.778-0500 [DEBUG] Started SCP session, beginning transfers...
2023-12-08T14:05:59.779-0500 [DEBUG] Beginning file upload...
2023-12-08T14:05:59.802-0500 [DEBUG] SCP session complete, closing stdin pipe.
2023-12-08T14:05:59.803-0500 [DEBUG] Waiting for SSH session to complete.
2023-12-08T14:05:59.832-0500 [ERROR] scp stderr: "Sink: C0644 74 terraform_1912908404.cmd\r\n"
2023-12-08T14:05:59.832-0500 [DEBUG] opening new ssh session
2023-12-08T14:05:59.853-0500 [DEBUG] starting remote command: C:/windows/temp/terraform_1912908404.cmd

aws_instance.training_node (remote-exec): administrator@EC2AMAZ-KO02ITT C:\Users\administrator>powershell "Get-Childitem 
C:\Users\Administrator -Recurse | Unblock-File"
2023-12-08T14:06:00.794-0500 [DEBUG] remote command exited with '0': C:/windows/temp/terraform_1912908404.cmd
2023-12-08T14:06:00.802-0500 [DEBUG] opening new ssh session
2023-12-08T14:06:00.825-0500 [DEBUG] Starting remote scp process:  "scp" -vt C:/windows/temp
2023-12-08T14:06:00.931-0500 [DEBUG] Started SCP session, beginning transfers...
2023-12-08T14:06:00.934-0500 [DEBUG] Copying input data into temporary file so we can read the length
2023-12-08T14:06:00.942-0500 [DEBUG] Beginning file upload...
2023-12-08T14:06:01.056-0500 [DEBUG] SCP session complete, closing stdin pipe.
2023-12-08T14:06:01.057-0500 [DEBUG] Waiting for SSH session to complete.
2023-12-08T14:06:01.089-0500 [ERROR] scp stderr: "Sink: C0644 0 terraform_1912908404.cmd\r\n"
2023-12-08T14:06:01.089-0500 [TRACE] applyProvisioners: provisioning aws_instance.training_node with "remote-exec"
2023-12-08T14:06:01.090-0500 [TRACE] terraform.contextPlugins: Initializing provisioner "remote-exec" to read its schema 
2023-12-08T14:06:01.091-0500 [TRACE] terraform.contextPlugins: Initializing provider "registry.terraform.io/hashicorp/aws" to read its schema
2023-12-08T14:06:01.091-0500 [TRACE] terraform.contextPlugins: Initializing provider "registry.terraform.io/hashicorp/aws" to read its schema
aws_instance.training_node: Provisioning with 'remote-exec'...
2023-12-08T14:06:01.093-0500 [INFO]  using private key for authentication
aws_instance.training_node (remote-exec): Connecting to remote host via SSH...
aws_instance.training_node (remote-exec):   Host: 3.99.142.220
aws_instance.training_node (remote-exec):   User: Administrator
aws_instance.training_node (remote-exec):   Password: false
aws_instance.training_node (remote-exec):   Private key: true
aws_instance.training_node (remote-exec):   Certificate: false
aws_instance.training_node (remote-exec):   SSH Agent: false
aws_instance.training_node (remote-exec):   Checking Host Key: false
aws_instance.training_node (remote-exec):   Target Platform: windows
2023-12-08T14:06:01.096-0500 [DEBUG] Connecting to 3.99.142.220:22 for SSH
2023-12-08T14:06:01.116-0500 [DEBUG] Connection established. Handshaking for user Administrator
aws_instance.training_node (remote-exec): Connected!
2023-12-08T14:06:01.400-0500 [DEBUG] starting ssh KeepAlives
2023-12-08T14:06:01.401-0500 [DEBUG] opening new ssh session
2023-12-08T14:06:01.515-0500 [DEBUG] Starting remote scp process:  "scp" -vt C:/windows/temp
2023-12-08T14:06:01.555-0500 [DEBUG] Started SCP session, beginning transfers...
2023-12-08T14:06:01.563-0500 [DEBUG] Beginning file upload...
2023-12-08T14:06:01.649-0500 [DEBUG] SCP session complete, closing stdin pipe.
2023-12-08T14:06:01.650-0500 [DEBUG] Waiting for SSH session to complete.
2023-12-08T14:06:01.681-0500 [ERROR] scp stderr: "Sink: C0644 73 terraform_106553513.cmd\r\n"
2023-12-08T14:06:01.681-0500 [DEBUG] opening new ssh session
2023-12-08T14:06:01.702-0500 [DEBUG] starting remote command: C:/windows/temp/terraform_106553513.cmd

aws_instance.training_node (remote-exec): administrator@EC2AMAZ-KO02ITT C:\Users\administrator>start /b "looper1" "powershell.exe" "C:\Users\Administrator\looper1.ps1"
2023-12-08T14:06:01.897-0500 [DEBUG] remote command exited with '0': C:/windows/temp/terraform_106553513.cmd
2023-12-08T14:06:01.897-0500 [DEBUG] opening new ssh session
2023-12-08T14:06:01.918-0500 [DEBUG] Starting remote scp process:  "scp" -vt C:/windows/temp
2023-12-08T14:06:01.951-0500 [DEBUG] Started SCP session, beginning transfers...
2023-12-08T14:06:01.953-0500 [DEBUG] Copying input data into temporary file so we can read the length
2023-12-08T14:06:01.961-0500 [DEBUG] Beginning file upload...
2023-12-08T14:06:02.045-0500 [DEBUG] SCP session complete, closing stdin pipe.
2023-12-08T14:06:02.046-0500 [DEBUG] Waiting for SSH session to complete.
2023-12-08T14:06:02.079-0500 [ERROR] scp stderr: "Sink: C0644 0 terraform_106553513.cmd\r\n"
2023-12-08T14:06:02.080-0500 [TRACE] applyProvisioners: provisioning aws_instance.training_node with "remote-exec"
2023-12-08T14:06:02.080-0500 [TRACE] terraform.contextPlugins: Initializing provisioner "remote-exec" to read its schema 
2023-12-08T14:06:02.081-0500 [TRACE] terraform.contextPlugins: Initializing provider "registry.terraform.io/hashicorp/aws" to read its schema
2023-12-08T14:06:02.081-0500 [TRACE] terraform.contextPlugins: Initializing provider "registry.terraform.io/hashicorp/aws" to read its schema
aws_instance.training_node: Provisioning with 'remote-exec'...
2023-12-08T14:06:02.084-0500 [INFO]  using private key for authentication
aws_instance.training_node (remote-exec): Connecting to remote host via SSH...
aws_instance.training_node (remote-exec):   Host: 3.99.142.220
aws_instance.training_node (remote-exec):   User: Administrator
aws_instance.training_node (remote-exec):   Password: false
aws_instance.training_node (remote-exec):   Private key: true
aws_instance.training_node (remote-exec):   Certificate: false
aws_instance.training_node (remote-exec):   SSH Agent: false
aws_instance.training_node (remote-exec):   Checking Host Key: false
aws_instance.training_node (remote-exec):   Target Platform: windows
2023-12-08T14:06:02.087-0500 [DEBUG] Connecting to 3.99.142.220:22 for SSH
2023-12-08T14:06:02.108-0500 [DEBUG] Connection established. Handshaking for user Administrator
2023-12-08T14:06:02.322-0500 [TRACE] dag/walk: vertex "root" is waiting for "provider[\"registry.terraform.io/hashicorp/aws\"] (close)"
2023-12-08T14:06:02.399-0500 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/aws\"] (close)" is waiting for "aws_instance.training_node"
aws_instance.training_node (remote-exec): Connected!
2023-12-08T14:06:02.401-0500 [DEBUG] starting ssh KeepAlives
2023-12-08T14:06:02.402-0500 [DEBUG] opening new ssh session
2023-12-08T14:06:02.580-0500 [DEBUG] Starting remote scp process:  "scp" -vt C:/windows/temp
2023-12-08T14:06:02.689-0500 [DEBUG] Started SCP session, beginning transfers...
2023-12-08T14:06:02.690-0500 [DEBUG] Beginning file upload...
2023-12-08T14:06:02.784-0500 [DEBUG] SCP session complete, closing stdin pipe.
2023-12-08T14:06:02.784-0500 [DEBUG] Waiting for SSH session to complete.
2023-12-08T14:06:02.852-0500 [ERROR] scp stderr: "Sink: C0644 101 terraform_781327820.cmd\r\n"
2023-12-08T14:06:02.853-0500 [DEBUG] opening new ssh session
2023-12-08T14:06:02.875-0500 [DEBUG] starting remote command: C:/windows/temp/terraform_781327820.cmd

aws_instance.training_node (remote-exec): administrator@EC2AMAZ-KO02ITT C:\Users\administrator>powershell.exe Start-Process -FilePath "powershell" -ArgumentList C:\Users\Administrator\looper2.ps1
2023-12-08T14:06:04.162-0500 [DEBUG] remote command exited with '0': C:/windows/temp/terraform_781327820.cmd
2023-12-08T14:06:04.163-0500 [DEBUG] opening new ssh session
2023-12-08T14:06:04.183-0500 [DEBUG] Starting remote scp process:  "scp" -vt C:/windows/temp
2023-12-08T14:06:04.215-0500 [DEBUG] Started SCP session, beginning transfers...
2023-12-08T14:06:04.216-0500 [DEBUG] Copying input data into temporary file so we can read the length
2023-12-08T14:06:04.224-0500 [DEBUG] Beginning file upload...
2023-12-08T14:06:04.418-0500 [DEBUG] SCP session complete, closing stdin pipe.
2023-12-08T14:06:04.419-0500 [DEBUG] Waiting for SSH session to complete.
2023-12-08T14:06:04.444-0500 [ERROR] scp stderr: "Sink: C0644 0 terraform_781327820.cmd\r\n"
2023-12-08T14:06:04.445-0500 [TRACE] GRPCProvider: GetProviderSchema
2023-12-08T14:06:04.446-0500 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for aws_instance.training_node
2023-12-08T14:06:04.447-0500 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for aws_instance.training_node
aws_instance.training_node: Creation complete after 3m37s [id=i-0e7d659e1a0975eac]
2023-12-08T14:06:04.450-0500 [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write
2023-12-08T14:06:04.455-0500 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 368
2023-12-08T14:06:04.456-0500 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
2023-12-08T14:06:04.510-0500 [TRACE] vertex "aws_instance.training_node": visit complete
2023-12-08T14:06:04.510-0500 [TRACE] vertex "provider[\"registry.terraform.io/hashicorp/aws\"] (close)": starting visit (*terraform.graphNodeCloseProvider)
2023-12-08T14:06:04.511-0500 [TRACE] GRPCProvider: Close
2023-12-08T14:06:04.512-0500 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2023-12-08T14:06:04.572-0500 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.9.0/windows_amd64/terraform-provider-aws_v5.9.0_x5.exe pid=24016
2023-12-08T14:06:04.573-0500 [DEBUG] provider: plugin exited
2023-12-08T14:06:04.573-0500 [TRACE] vertex "provider[\"registry.terraform.io/hashicorp/aws\"] (close)": visit complete  
2023-12-08T14:06:04.574-0500 [TRACE] vertex "root": starting visit (*terraform.nodeCloseModule)
2023-12-08T14:06:04.575-0500 [TRACE] vertex "root": visit complete
2023-12-08T14:06:04.575-0500 [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write
2023-12-08T14:06:04.576-0500 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 369
2023-12-08T14:06:04.577-0500 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
2023-12-08T14:06:04.606-0500 [TRACE] statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info
2023-12-08T14:06:04.610-0500 [TRACE] statemgr.Filesystem: unlocked by closing terraform.tfstate

Apply complete! Resources: 10 added, 0 changed, 0 destroyed.

Expected Behavior

The powershell scripts looper1.ps1 and looper2.ps1 should have been run as separate processes on the EC2 Windows instance.

Actual Behavior

The powershell scripts looper1.ps1 and looper2.ps1 do not appear to run at all.

Steps to Reproduce

Run "terraform apply"

Additional Context

I am trying to execute a powershell script as a separate process on an EC2 windows instance using "remote-exec". However, terraform does not seem to be able to launch the powershell script as a separate process - the powershell script does not run.

To illustrate the problem I have created two long-running powershell scripts called "looper1.ps1" and "looper2.ps1" which simply loop forever and write to log files in C:\Users\Administrator" every 5s (please see scripts below. I can tell whether the scripts are running or not by the presence (or absence) of the log files (called "Looper1.log" and "Looper2.log" respectively.

In the terraform snippet from my "main.tf" (which is included above) I try to execute the "looper1.ps1" script as a separate process on the EC2 windows instance using remote-exec and the command:

start /b "looper1" "powershell.exe" "C:\Users\Administrator\looper1.ps1"

I alsoI try to execute the "looper2.ps1" script as a separate process on the EC2 windows instance using remote-exec and the command:

powershell.exe Start-Process -FilePath "powershell" -ArgumentList C:\Users\Administrator\looper2.ps1

In the terraform debug output it looks like both of these commands execute successfully (they both exit with code 0). However, when I subsequently log in to the EC2 windows instance I do not see the log files "Looper1.log" and "Looper2.log" (which should be updating every 5s if looper1.ps1 and looper2.ps1 are running), nor do I see any evidence that they are running if I execute a "tasklist" command.

When I am logged in to the EC2 instance, if I open a cmd window and manually issue the command:

start /b "looper1" "powershell.exe" "C:\Users\Administrator\looper1.ps1"

then I immediately see the "Looper1.log" file created and updated every 5s. Issuing a "tasklist" command shows the presence of a new powershell.exe process belonging to "looper1.ps1".

In the same cmd window if I manually issue the command:

powershell.exe Start-Process -FilePath "powershell" -ArgumentList C:\Users\Administrator\looper2.ps1

Then a new powershell window opens, the command returns control to the cmd window, and I immediately see the "Looper2.log" file created and updated every 5s. Issuing a "tasklist" command shows the presence of a new powershell process belonging to "looper2.ps1".

So, it seems that I can manually issue these commands and start the powershell scripts as separate processes, but issuing the same commands through terraform "remote-exec" does nothing.

I have also tried starting the powershell scripts from terraform using "remote-exec" but not as separate processes (ie; in the foreground). The relevant commands are:

start "looper1" "powershell.exe" "C:\Users\Administrator\looper1.ps1"

and

powershell.exe C:\Users\Administrator\looper2.ps1

The first command is (start "looper1"...) is executed by terraform and returns with exit code 0. However, as before, it does not seem to do anything on the EC2 windows instance (in the sense that when I log in to the remote EC2 windows instance there is no "Looper1.log" file created), and the "looper1.ps1" powershell script does not appear to be running on the EC2 windows instance.

When terraform hits the second command (powershell.exe C:\Users\Administrator\looper2.ps1) terraform executes it on the remote EC2 windows instance and then waits until the command finishes executing (which means that it waits forever because the powershell script is running in the foreground and loops forever and never returns). When I log in to the EC2 windows instance I do see the presence of the "Looper2.log" file and I do see a powershell process running in tasklist associated with "looper2.ps1". So, it seems like terraform is able to run "looper2.ps1" in the foreground (but of course, terraform never returns and keeps running forever too!).

So, it looks like, for some reason, terraform is not successful launching the powershell scripts as separate processes using "remote-exec" (even though I can issue exactly the same commands in a cmd window on the EC2 instance to successfully launch the powershell scripts as separate processes).

You may ask why I want to run the powershell scripts as separate processes in the first place. It is because I the powershell scripts are executing and monitoring long-running installation processes and I want to build several EC2 windows instances in parallel (and not serially).

And, finally, I have been successful launching the powershell scripts as separate processes upon EC2 instance instantiation by embedding the powershell scripts in the AMI and then using EC2Launch (and properly configuring agent-config.yml). However, embedding the powershell scripts in the AMI and using EC2Launch is a brittle and error-prone workaround. I would much rather do it flexibly using terraform and "remote-exec" - it just doesn't seem to work.

this is looper1.ps1

$looperLogFile = "C:\Users\Administrator\Looper1.log" $sleepTime = 5 while ($true) { Add-Content -Path $looperLogFile -Value "Looper1. Waiting $sleepTime." Start-Sleep -s $sleepTime }

this is looper2.ps2

$looperLogFile = "C:\Users\Administrator\Looper2.log" $sleepTime = 5 while ($true) { Add-Content -Path $looperLogFile -Value "Looper2. Waiting $sleepTime." Start-Sleep -s $sleepTime }

References

No response

crw commented 11 months ago

Thanks for this bug report, @papaneeds. While a maintainer may weigh in on the issue, I'll note that provisioners are not receiving active development, per https://github.com/hashicorp/terraform/blob/main/.github/CONTRIBUTING.md#provisioners.

This looks like a fairly sophisticated use case, you may also find help from the community forum.

Thanks!

mattlqx commented 6 months ago

I found this to be a manifestation of what is described in this SO post. In general, processes spawned from sshd are cleaned up on disconnection.

The workaround described in it however did work for me: using Cygwin's start nohup to call the script. One caveat was that I had to sleep long enough after calling the script to prevent disconnection until the long-running process was spawned.