Open Dan-JoeLopez opened 7 years ago
I've just tested and found that I get the same behavior when using remote-exec
to run the script as a file on the instance.
data "template_file" "provision" {
template = "${file("${path.module}/templates/provision.bat.tpl")}"
vars {
chef_ver = "${ var.chef_client_version }"
cookbook = "${ var.cookbook }"
recipe = "${ var.recipe }"
}
}
resource "null_resource" "prep_script" {
count = "${ var.server_count }"
depends_on = ["openstack_compute_floatingip_associate_v2.attach_corp_ip"]
connection {
host = "${element(openstack_networking_floatingip_v2.get_corp_ip.*.address, count.index)}"
type = "winrm"
user = "chef"
password = "${ var.chef_pass }"
insecure = true
}
provisioner "file" {
content = "${ data.template_file.provision.rendered }"
destination = "c:\\chef\\repo\\provision.bat"
}
}
resource "null_resource" "Prepare_Chef" {
count = "${ var.server_count }"
depends_on = ["null_resource.prep_script"]
connection {
host = "${element(openstack_networking_floatingip_v2.get_corp_ip.*.address, count.index)}"
type = "winrm"
user = "chef"
password = "${ var.chef_pass }"
insecure = true
}
provisioner "remote-exec" {
inline = [
"c:\\chef\\repo\\provision.bat"
]
}
}
I'm seeing this as well, not only on windows but with linux as well.
resource "aws_instance" "linux" {
security_groups = ["${aws_security_group.jenkins_linux.name}"]
ami = "${var.linux_ami}"
instance_type = "${var.linux_type}"
associate_public_ip_address = true
key_name = "victor-ssh-key"
count = "${var.linux_count}"
connection {
type = "ssh"
user = "ubuntu"
}
provisioner "file" {
content = "${data.template_file.jenkins_worker_service.rendered}"
destination = "/tmp/swarm.service"
}
provisioner "remote-exec" {
inline = [
"sudo apt update",
"sudo apt install --yes wget htop default-jre",
# TODO should be copied over instead of downloaded
"cd /tmp && wget https://repo.jenkins-ci.org/releases/org/jenkins-ci/plugins/swarm-client/${var.swarm_version}/swarm-client-${var.swarm_version}.jar",
"sudo mv /tmp/swarm.service /etc/systemd/system/swarm.service",
"sudo systemctl start swarm",
]
}
}
With this, the remote-exec provisioner doesn't execute the two last steps but instead exists successfull (???) after the wget.
Debug output:
aws_instance.linux (remote-exec): --2017-10-31 18:36:14-- https://repo.jenkins-ci.org/releases/org/jenkins-ci/plugins/swarm-client/3.6/swarm-client-3.6.jar
aws_instance.linux (remote-exec): Resolving repo.jenkins-ci.org (repo.jenkins-ci.org)... 130.211.20.35
aws_instance.linux (remote-exec): Connecting to repo.jenkins-ci.org (repo.jenkins-ci.org)|130.211.20.35|:443... connected.
aws_instance.linux (remote-exec): HTTP request sent, awaiting response... 200 OK
aws_instance.linux (remote-exec): Length: 1620623 (1.5M) [application/java-archive]
aws_instance.linux (remote-exec): Saving to: ‘swarm-client-3.6.jar’
aws_instance.linux (remote-exec): swarm 0% 0 --.-KB/s
aws_instance.linux (remote-exec): swarm-clien 100% 1.54M --.-KB/s in 0.1s
aws_instance.linux (remote-exec): 2017-10-31 18:36:14 (12.1 MB/s) - ‘swarm-client-3.6.jar’ saved [1620623/1620623]
2017-10-31T18:36:14.848Z [DEBUG] plugin.terraform: remote-exec-provisioner (internal) 2017/10/31 18:36:14 remote command exited with '0': /tmp/terraform_682535596.sh
2017-10-31T18:36:14.848Z [DEBUG] plugin.terraform: remote-exec-provisioner (internal) 2017/10/31 18:36:14 opening new ssh session
2017-10-31T18:36:14.921Z [DEBUG] plugin.terraform: remote-exec-provisioner (internal) 2017/10/31 18:36:14 Starting remote scp process: scp -vt /tmp
2017-10-31T18:36:14.995Z [DEBUG] plugin.terraform: remote-exec-provisioner (internal) 2017/10/31 18:36:14 Started SCP session, beginning transfers...
2017-10-31T18:36:14.996Z [DEBUG] plugin.terraform: remote-exec-provisioner (internal) 2017/10/31 18:36:14 Copying input data into temporary file so we can read the length
2017-10-31T18:36:14.998Z [DEBUG] plugin.terraform: remote-exec-provisioner (internal) 2017/10/31 18:36:14 Beginning file upload...
2017-10-31T18:36:15.072Z [DEBUG] plugin.terraform: remote-exec-provisioner (internal) 2017/10/31 18:36:15 SCP session complete, closing stdin pipe.
2017-10-31T18:36:15.072Z [DEBUG] plugin.terraform: remote-exec-provisioner (internal) 2017/10/31 18:36:15 Waiting for SSH session to complete.
2017-10-31T18:36:15.146Z [DEBUG] plugin.terraform: remote-exec-provisioner (internal) 2017/10/31 18:36:15 scp stderr (length 37): Sink: C0644 0 terraform_682535596.sh
aws_instance.linux: Creation complete after 1m4s (ID: i-0335fd31dbfa9d2df)
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
2017/10/31 18:36:15 [DEBUG] plugin: waiting for all plugin processes to complete...
Outputs:
linux_ips = [
34.230.x.x
]
windows_ips = []
2017-10-31T18:36:15.177Z [DEBUG] plugin.terraform: file-provisioner (internal) 2017/10/31 18:36:15 [DEBUG] plugin: waiting for all plugin processes to complete...
2017-10-31T18:36:15.178Z [DEBUG] plugin.terraform: remote-exec-provisioner (internal) 2017/10/31 18:36:15 [DEBUG] plugin: waiting for all plugin processes to complete...
2017-10-31T18:36:15.180Z [DEBUG] plugin: plugin process exited: path=/usr/bin/terraform
2017-10-31T18:36:15.180Z [DEBUG] plugin: plugin process exited: path=/root/jenkins/worker/.terraform/plugins/linux_amd64/terraform-provider-aws_v1.1.0_x4
2017-10-31T18:36:15.182Z [WARN ] plugin: error closing client during Kill: err="unexpected EOF"
2017-10-31T18:36:15.182Z [DEBUG] plugin: plugin process exited: path=/usr/bin/terraform
Guessing it's the plugin: error closing client during Kill: err="unexpected EOF"
that is the trouble, but I no idea if it's a problem with my configuration or Terraform. For the record, the same steps did work on a different machine (desktop at my home, rather than running from a DO droplet)
After some more debugging, I tried downgrading Terraform and also the aws provider but to no success, this issue happens in all conditions currently. What was working before on my desktop at home does not work anymore, which makes me believe that something changed on AWS side of things.
This is probably same issue as https://github.com/hashicorp/terraform/issues/15963. It appears that for all provisioners, on all platforms, once Terraform thinks "Creation complete" then the rest of provisioning is halted. In my case, I have two inline commands in a remote-exec provisioner. The first command is supposed to block until an AWS userdata script completes and a signaling file is written to a temporary directory (takes about 13 minutes). This approach is suggested by @calvn in https://github.com/hashicorp/terraform/issues/4668.
However, what actually happens is that the first inline remote-exec command never completes, as you can see below (the Setup not complete. Retrying... message comes from the first command). The second command is never called. As soon as the "Creation complete" message appears, the remote-exec is stopped and no more commands execute.
[0m[1maws_instance.windows: Still creating... (11m0s elapsed)
[0m[0m
[0m[0maws_instance.windows (remote-exec): Setup not complete. Retrying...
[0m[0maws_instance.windows (remote-exec): Setup not complete. Retrying...
[0m[0maws_instance.windows (remote-exec): Setup not complete. Retrying...
[0m[0maws_instance.windows (remote-exec): Setup not complete. Retrying...
[0m[1maws_instance.windows: Still creating... (11m10s elapsed)
[0m[0m
[0m[0maws_instance.windows (remote-exec): Setup not complete. Retrying...
[0m[0maws_instance.windows (remote-exec): Setup not complete. Retrying...
[0m[0maws_instance.windows (remote-exec): Setup not complete. Retrying...
[0m[0maws_instance.windows (remote-exec): Setup not complete. Retrying...
[0m[1maws_instance.windows: Still creating... (11m20s elapsed)
[0m[0m
[0m[0maws_instance.windows (remote-exec): Setup not complete. Retrying...
[0m[0maws_instance.windows (remote-exec): Setup not complete. Retrying...
[0m[0maws_instance.windows (remote-exec): Setup not complete. Retrying...
[0m[1maws_instance.windows: Creation complete after 11m25s (ID: i-0f7160b825268416g)
[0m[0m
[0m[1m[32m Apply complete! Resources: 5 added, 0 changed, 1 destroyed.
[0m[0m
[1m[32m Outputs:
ami2016 = ami-0a792a70
[0m
[Container] 2018/01/11 16:12:49 Running command
[Container] 2018/01/11 16:12:49 Running command
[Container] 2018/01/11 16:12:49 Running command
[Container] 2018/01/11 16:12:49 Running command
[Container] 2018/01/11 16:12:49 Running command
...
[Container] 2018/01/11 16:12:49 Phase complete: BUILD Success: true
[Container] 2018/01/11 16:12:49 Phase context status code: Message:
This is my solution:
provisioner "remote-exec" {
inline = [
"tail -f /var/log/cloud-init-output.log | sed '/SOME KEYWORD/q'"
]
}
connection {
type = "ssh"
host = "${aws_instance.instance.public_ip}"
user = "ec2-user"
private_key = "${file(var.ssh_key_path)}"
}
Still seeing this issue with terraform 1.3.2 and aws 4.56.0
In my scenario this happens seemingly randomly and if I run apply again it generally works.
Tried the keep alive trick outlined here and it made no difference: https://github.com/hashicorp/terraform/issues/18517
When using the
remote_exec
provisioner, it sometimes stops in the middle of the script and exits. The only way I have found to get around this, is to split the script into severalnull_resource
blocks to execute the script one bit at a time.Terraform Version
Terraform v0.10.3
Terraform Configuration Files
This is what I would like to be able to do:
This is what I have that works; but I should be able to do all this in one shot!
Expected Behavior
The `remote-exec' block should run all of the commands.
Actual Behavior
The
remote-exec
block stops part way through and show no error.Please let me know if there is some limitation or something that I am missing.