hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.42k stars 9.5k forks source link

Chef Provisioner Failing #17638

Closed angelosanramon closed 6 years ago

angelosanramon commented 6 years ago

Terraform Version

Terraform v0.11.4

Terraform Configuration Files

  connection {
    type        = "ssh"
    user        = "centos"
    private_key = "${file("ssh_private_key.pem")}"
  }

  provisioner "remote-exec" {
    inline = [
      "sudo yum -y install ntp && sudo ntpdate -s time.nist.gov",
      "sudo yum -y install python-pip",
      "sudo hostname ${var.hostname}",
      "sudo mkdir /etc/chef && sudo chmod 777 /etc/chef",
      "sudo echo -e '${file("validation.pem")}' > /etc/chef/validation.pem",
      "sudo chmod 600 /etc/chef/validation.pem",
      "sudo chmod 755 /etc/chef",
      "sudo mkdir -p /etc/chef/ohai/hints && sudo chmod 777 /etc/chef/ohai/hints",
      "sudo echo -e '${file("pokey_hints.json")}' > /etc/chef/ohai/hints/pokey_hints.json",
      "sudo chmod 755 /etc/chef/ohai/hints",
      "sudo chown root:root /etc/chef/ohai/hints/pokey_hints.json",
      "sudo chmod 600 /etc/chef/ohai/hints/pokey_hints.json",
      "sudo touch /etc/chef/ohai/hints/ec2.json"
    ]
  }

  provisioner "chef" {
    use_policyfile  = true
    policy_group    = "${var.environment}"
    policy_name     = "${var.project}"
    log_to_file     = false
    server_url      = "${var.chef_url}"
    user_name       = "${var.chef_org}-validator"
    user_key        = "${file("validation.pem")}"
    node_name       = "${var.hostname}"
    recreate_client = true
    ssl_verify_mode = ":verify_none"
    version         = "${var.chef_version}"
  }

Debug Output

module.x1answers-classic-cd-xre-0101166-gig9g.aws_instance.server: Provisioning with 'chef'...
module.x1answers-classic-cd-xre-0101166-gig9g.aws_instance.server (chef): Connecting to remote host via SSH...
module.x1answers-classic-cd-xre-0101166-gig9g.aws_instance.server (chef):   Host: 192.168.1.50
module.x1answers-classic-cd-xre-0101166-gig9g.aws_instance.server (chef):   User: centos
module.x1answers-classic-cd-xre-0101166-gig9g.aws_instance.server (chef):   Password: false
module.x1answers-classic-cd-xre-0101166-gig9g.aws_instance.server (chef):   Private key: true
module.x1answers-classic-cd-xre-0101166-gig9g.aws_instance.server (chef):   SSH Agent: true
module.x1answers-classic-cd-xre-0101166-gig9g.aws_instance.server (chef):   Checking Host Key: false
module.x1answers-classic-cd-xre-0101166-gig9g.aws_instance.server (chef): Connected!
module.x1answers-classic-cd-xre-0101166-gig9g.aws_instance.server (chef): Cleanup user key...
Releasing state lock. This may take a few moments...

Error: Error applying plan:

1 error(s) occurred:

* module.x1answers-classic-cd-xre-0101166-gig9g.aws_instance.server: io: read/write on closed pipe

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Crash Output

Expected Behavior

Chef provisioning should complete successfully.

Actual Behavior

The Chef provisioner starts but unexpectedly disconnect once connected. Chef provisioner works just fine with Terraform version v0.11.3. Also, remote-exec provisioner completes successfully.

Steps to Reproduce

Additional Context

Running a Python wrapper script with python_terraform module.

References

Lasering commented 6 years ago

I'm also experiencing this issue.

wenwolf commented 6 years ago

Same goes here with 0.11.4. Edit: using 0.11.3 works fine (I was in the process of upgrading from 0.8.8 to 0.11.4, looks like I'll have to live with 0.11.3 for now).

jkerak commented 6 years ago

I'm also experiencing this on 0.11.4, and can't switch back to 0.11.3 since my remote state had already been created with 0.11.4

rberlind commented 6 years ago

I still see the same problem even after upgrading to terraform 0.11.5 when using the Chef provisioner against an aws_instance resource:

Error: Error applying plan: 1 error(s) occurred:

resource "aws_instance" "chef-node" { ami = "${data.aws_ami.ubuntu.id}" instance_type = "${var.instance_type}" subnet_id = "${var.subnet_id}" key_name = "${var.key_name}" vpc_security_group_ids = ["${aws_security_group.chef-node.id}"] associate_public_ip_address = true

tags { Name = "${var.environment_name}-chef-node" }

user_data = "${data.template_file.role-id.rendered}"

provisioner "chef" { connection { type = "ssh" user = "ubuntu" private_key = "${file(var.ec2_pem)}" }

node_name = "chef-node-test"

//client_options = ["log_level :debug"]
server_url = "${var.chef_server_address}"
user_name  = "demo-admin"
user_key   = "${data.aws_s3_bucket_object.chef_bootstrap_pem.body}"

run_list                = ["recipe[vault_chef_approle_demo]"]
recreate_client         = true
fetch_chef_certificates = true
ssl_verify_mode         = ":verify_none"

} }

This code worked fine with 0.11.3, but broke with 0.11.4 and is not fixed by 0.11.5, Reverting back to 0.11.3 without changing code allowed it to work again.

Lasering commented 6 years ago

Not fixed in 0.11.5. I'm still getting io: read/write on closed pipe.

jbardin commented 6 years ago

Thanks for the info @rberlind, we'll get someone to look into what else could be cutting it off.

rberlind commented 6 years ago

Thanks. Actual project I am using is: https://github.com/hashicorp/vault-guides/tree/master/identity/vault-chef-approle. The problem occurs in "Phase 2" when I try to run terraform to provision the chef-node. The specific file with the Chef provisioner is: https://github.com/hashicorp/vault-guides/blob/master/identity/vault-chef-approle/terraform-aws/chef-node/main.tf

WintersMichael commented 6 years ago

I'm also getting this on 0.11.4 / 0.11.5. Here's my scrubbed TRACE output:

2018-03-23T11:49:38.227-0500 [DEBUG] plugin.terraform: chef-provisioner (internal) 2018/03/23 11:49:38 starting remote command: sudo knife node show my-node-04 -c /etc/chef/client.rb -u my-validator --key /etc/chef/my-validator.pem > /dev/null 2>&1
2018-03-23T11:49:38.929-0500 [DEBUG] plugin.terraform: chef-provisioner (internal) 2018/03/23 11:49:38 remote command exited with '100': sudo knife node show my-node-04 -c /etc/chef/client.rb -u my-validator --key /etc/chef/my-validator.pem > /dev/null 2>&1
2018-03-23T11:49:38.929-0500 [DEBUG] plugin.terraform: chef-provisioner (internal) 2018/03/23 11:49:38 opening new ssh session
2018-03-23T11:49:39.043-0500 [DEBUG] plugin.terraform: chef-provisioner (internal) 2018/03/23 11:49:39 starting remote command: sudo knife client show my-node-04 -c /etc/chef/client.rb -u my-validator --key /etc/chef/my-validator.pem > /dev/null 2>&1
2018-03-23T11:49:39.694-0500 [DEBUG] plugin.terraform: chef-provisioner (internal) 2018/03/23 11:49:39 opening new ssh session
2018-03-23T11:49:39.694-0500 [DEBUG] plugin.terraform: chef-provisioner (internal) 2018/03/23 11:49:39 remote command exited with '100': sudo knife client show my-node-04 -c /etc/chef/client.rb -u my-validator --key /etc/chef/my-validator.pem > /dev/null 2>&1
2018-03-23T11:49:39.806-0500 [DEBUG] plugin.terraform: chef-provisioner (internal) 2018/03/23 11:49:39 starting remote command: sudo knife client create my-node-04 -d -f /etc/chef/client.pem -c /etc/chef/client.rb -u my-validator --key /etc/chef/my-validator.pem
2018/03/23 11:49:40 [TRACE] dag/walk: vertex "module.my-module.output.publicip", waiting for: "module.my-node[4]"
2018/03/23 11:49:40 [TRACE] dag/walk: vertex "output.my-module-privateip", waiting for: "module.my-module.output.privateip"
2018/03/23 11:49:40 [TRACE] dag/walk: vertex "module.my-module.output.privateip", waiting for: "module.my-node[4]"
2018/03/23 11:49:40 [TRACE] dag/walk: vertex "provisioner.file (close)", waiting for: "module.my-node[4]"
2018/03/23 11:49:40 [TRACE] dag/walk: vertex "root", waiting for: "provider.aws (close)"
2018/03/23 11:49:40 [TRACE] dag/walk: vertex "provisioner.chef (close)", waiting for: "module.my-node[4]"
2018/03/23 11:49:40 [TRACE] dag/walk: vertex "output.my-module-publicip", waiting for: "module.my-module.output.publicip"
2018/03/23 11:49:40 [TRACE] dag/walk: vertex "provisioner.remote-exec (close)", waiting for: "module.my-node[4]"
2018/03/23 11:49:40 [TRACE] dag/walk: vertex "provider.aws (close)", waiting for: "module.my-node[4]"
2018/03/23 11:49:40 [TRACE] dag/walk: vertex "meta.count-boundary (count boundary fixup)", waiting for: "output.my-module-privateip"
2018-03-23T11:49:40.595-0500 [DEBUG] plugin.terraform: chef-provisioner (internal) 2018/03/23 11:49:40 remote command exited with '0': sudo knife client create my-node-04 -d -f /etc/chef/client.pem -c /etc/chef/client.rb -u my-validator --key /etc/chef/my-validator.pem
2018-03-23T11:49:40.595-0500 [DEBUG] plugin.terraform: chef-provisioner (internal) 2018/03/23 11:49:40 opening new ssh session
2018-03-23T11:49:40.713-0500 [DEBUG] plugin.terraform: chef-provisioner (internal) 2018/03/23 11:49:40 starting remote command: sudo rm -f /etc/chef/my-validator.pem
2018-03-23T11:49:40.824-0500 [DEBUG] plugin.terraform: chef-provisioner (internal) 2018/03/23 11:49:40 remote command exited with '0': sudo rm -f /etc/chef/my-validator.pem
2018/03/23 11:49:40 [ERROR] root.my-module: eval: *terraform.EvalApplyProvisioners, err: io: read/write on closed pipe
2018/03/23 11:49:40 [ERROR] root.my-module: eval: *terraform.EvalSequence, err: io: read/write on closed pipe
2018/03/23 11:49:40 [TRACE] [walkApply] Exiting eval tree: module.my-node[4]

In case it's relevant, my chef provisioner is using a bastion host:

  provisioner "chef" {
    connection {
      user = "${var.user}"
      host = "${self.private_ip}"
      bastion_host = "${var.bastion_host}"
      bastion_user = "${var.bastion_user}"
    }
    skip_install = true
    attributes_json = <<EOF
    redacted
EOF

    environment = "redacted"
    run_list = ["redacted"]
    node_name = "redacted"
    server_url = "redacted"
    user_name = "redacted"
    user_key = "redacted"
    ssl_verify_mode = "redacted"
  }
jkerak commented 6 years ago

I am also using a bastion host when I see this error message, if that helps.

jbardin commented 6 years ago

Closed by #17609

plainlystated commented 6 years ago

Glad to see a fix merged. What's the release schedule look like?

jeffbyrnes commented 6 years ago

Looks like the fix for this shipped with v0.11.6 🎉

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.