hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.78k stars 9.13k forks source link

[Bug]: AWS Provider 4.67.0 gzip on template_cloudinit_config causes join domain failure #31466

Open mike-alvarez opened 1 year ago

mike-alvarez commented 1 year ago

Terraform Core Version

1.4.5

AWS Provider Version

4.67.0

Affected Resource(s)

Expected Behavior

set the "gzip" option to true and it caused Active Directory "realm" join to fail.

Actual Behavior

during execution of userdata.sh script it failed on the realm join command

Failed to join domain: Failed to set machine spn: Constraint violation
Do you have sufficient permissions to create machine accounts?
ads_print_error: AD LDAP ERROR: 19 (Constraint violation): 0000200B: AtrErr: DSID-033E1153, #1:
        0: 0000200B: DSID-033E1153, problem 1005 (CONSTRAINT_ATT_TYPE), data 0, Att 906b5 (msDS-AdditionalDnsHostName)

Relevant Error/Panic Output Snippet

Failed to join domain: Failed to set machine spn: Constraint violation
Do you have sufficient permissions to create machine accounts?
ads_print_error: AD LDAP ERROR: 19 (Constraint violation): 0000200B: AtrErr: DSID-033E1153, #1:
        0: 0000200B: DSID-033E1153, problem 1005 (CONSTRAINT_ATT_TYPE), data 0, Att 906b5 (msDS-AdditionalDnsHostName)

Terraform Configuration Files

Template

data "template_cloudinit_config" "cloudinit" {
  gzip          = true
  base64_encode = true

  # Main cloud-config configuration file.
  part {
    filename     = "cloudinit.cfg"
    content_type = "text/cloud-config"
    content      = data.template_file.script.rendered
  }

  part {
    filename     = "userdata.txt"
    content_type = "text/x-shellscript"
    content      = data.template_file.build_script_ec2.rendered
  }
}

reference to template

resource "aws_launch_template" "launch_template_ccc" {
  name_prefix   = var.ec2_name
  image_id      = var.image_id
  instance_type = var.ec2_config.instance_type
  key_name      = aws_key_pair.ec2_key_pair.key_name
  user_data     = data.template_cloudinit_config.cloudinit.rendered

  network_interfaces {
    security_groups = [aws_security_group.security_group_description.id]
  }

  iam_instance_profile {
    name = data.aws_iam_instance_profile.instance_profile.name
  }

  block_device_mappings {
    device_name = var.ec2_config.root_device_name
    ebs {
      volume_size           = var.ec2_config.root_volume_size
      delete_on_termination = true
      encrypted             = true
      volume_type           = var.ec2_config.root_volume_type
    }
  }

  block_device_mappings {
    device_name = var.ec2_config.ebs_device_name
    ebs {
      volume_size           = var.ec2_config.ebs_volume_size
      delete_on_termination = true
      encrypted             = true
      volume_type           = var.ec2_config.ebs_volume_type
    }
  }

  tag_specifications {
    resource_type = "volume"

    tags = {
      Name           = "${var.ec2_name}-ebs-volume"
      owner          = lookup(var.tagset, "owner")
      group          = lookup(var.tagset, "group")
      application-id = lookup(var.tagset, "application-id")
      environment    = lookup(var.tagset, "environment")
    }
  }

  depends_on = [aws_s3_object.install_zip, aws_route53_record.domain-record]
}

command to join AD

echo ${domainJoinUserPW} | base64 --decode | /usr/sbin/realm join --user=${domainJoinUser} ${directoryDNSName} --verbose

Steps to Reproduce

in template_cloudinit_config set the "gzip" option to true - realm join fails

if set "gzip" option to false - realm join successful

Debug Output

n/a

Panic Output

n/a

Important Factoids

Cloud Provider = AWS Template provider version: v2.2.0

If we manually create the EC2 and manually execute the userdata.sh script. It works as expected.

Recently, we upgraded Terraform from 0.15.x to 1.4.5. This is when the issue started to occur.

The userdata.sh script gets another script from an AWS S3 bucket. It is that 2nd script that issues the "realm" command.

Once the EC2 has been created using the userdata.sh, manually rerunning the "realm" command also fails. It is like something in the environment was modified.

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

justinretzolk commented 1 year ago

Hey @mike-alvarez 👋 Thank you for taking the time to raise this! On initial review, I'm not certain that this appears to be a bug with the AWS Provider, since the resource(s) are created as expected, and the error seems to stem from the cloud init script. I noticed that you mentioned upgrading versions of Terraform recently. Was the version of the AWS provider changed at that point in time as well?

mike-alvarez commented 1 year ago

I did some further testing.

 

I believe that you are correct. It is not an issue with the AWS Provider. It is an issue with Active Directory itself. 

 

After adding retry logic in userdata.sh using the IP address of Active Directory instead of the DNS, the code is now working.

On 06/12/2023 3:36 PM CDT Justin Retzolk ***@***.***> wrote:

 

 

Hey @mike-alvarez 👋 Thank you for taking the time to raise this! On initial review, I'm not certain that this appears to be a bug with the AWS Provider, since the resource(s) are created as expected, and the error seems to stem from the cloud init script. I noticed that you mentioned upgrading versions of Terraform recently. Was the version of the AWS provider changed at that point in time as well?

 

--

Reply to this email directly or view it on GitHub:

https://github.com/hashicorp/terraform-provider-aws/issues/31466#issuecomment-1588054741

You are receiving this because you were mentioned.

 

Message ID: ***@***.***>