test-kitchen / kitchen-ec2

A Test Kitchen Driver for Amazon EC2
Other
221 stars 202 forks source link

user_data does not seem to work #369

Open mlcooper opened 6 years ago

mlcooper commented 6 years ago

With v2.0.0 of kitchen-ec2:

If I include a user_data script while using the aws cli, the user_data script runs as expected and produces the expect results in the instance that is stood up. However, when the same user_data script is included in the test kitchen config, it does not take effect.

This works successfully:

aws ec2 run-instances --image-id ami-xxxxxx --count 1 --instance-type t2.micro --key-name <redacted> --subnet-id subnet-xxxxxxx --security-group-ids sg-xxxxx sg-xxxxxx --user-data file://cloud-init-script.sh

If I do -l debug in test kitchen, I can see that user_data is getting passed in, but in the resulting instance, the settings that should be there are not.

Creating EC2 instance in region us-west-2 with properties:
D      - instance_type = "t2.micro"
D      - ebs_optimized = false
D      - image_id = "ami-xxxxxx"
D      - key_name = "<redacted>"
D      - private_ip_address = nil
D      - placement = {:availability_zone=>"us-west-2c", :tenancy=>"default"}
D      - user_data = "ZmlsZTovL2Nsb3VkLWluaXQtc2NyaXB0LnNo\n"
D      - network_interfaces = [{:device_index=>0, :associate_public_ip_address=>false, :delete_on_termination=>true, :subnet_id=>"subnet-xxxxxxx", :groups=>["sg-xxxxxxx", "sg-xxxxxxxx"]}]

The user_data removes the expiration of a password for the user that test kitchen uses to login and also removes it's password. But when TK spins up the instance, we still get the error that the users password has expired and must be reset.

My .kitchen.yml config:

platforms:
  - name: rhel7
    driver:
      image_id: ami-xxxxxx
      user_data: file://cloud-init-script.sh

The cloud-init-script.sh is in the same directory as the .kitchen.yml file.

cheeseplus commented 6 years ago

I don't believe file:// is expected or handled, we know user_data works in some respect as this is how all Windows instances are spun up. Try just the path to file instead of a URI.

mlcooper commented 6 years ago

I went ahead and tried

platforms:
  - name: rhel7
    driver:
      image_id: ami-xxxxxx
      user_data: ./cloud-init-script.sh

And I also tried

platforms:
  - name: rhel7
    driver:
      image_id: ami-xxxxxx
      user_data: cloud-init-script.sh

But both result in the instance telling us the password has expired and must be changed (the user_data script is supposed to take care of that).

The cloud-init-script.sh lives in the same directory as the .kitchen.yml file.

cheeseplus commented 6 years ago

I have constructed a simple test (kitchen 1.19.2 / kitchen-ec2 2.1.0):

test script is user_data_script.sh

#! /bin/bash
echo "user data script ran" >> /root/user_data_script.log

.kitchen.yml

---
driver:
  name: ec2
  region: us-west-1
  user_data: user_data_script.sh

provisioner:
  name: chef_solo

verifier:
  name: inspec

platforms:
  - name: ubuntu-16.04

suites:
  - name: default
    run_list:
      - recipe[kitchen_ec2]

After a kitchen converge (the cookbook doesn't do anything)

# cat /root/user_data_script.log
user data script ran

This seems to indicate the functionality is working as designed but without the contents of the script it's hard to say if something else is happening that would cause the script to be failing. Perhaps you could provide more detail about what is in the script or how exactly you're validating that nothing is being run on the remote system by user_data.

mlcooper commented 6 years ago

Our standard RHEL images come with a user called cloud. The first time you ssh into an instance with this image, you are forced to change the password for cloud. So for Test Kitchen purposes, we're trying to pass in a user_data script that blows away the password to cloud and changes its expiration time so that TK can continue on with the chef run. At the AWS CLI it works, but in TK it hangs here indefinitely:

       EC2 instance <i-0axxxxxxxxxxxxx> ready (hostname: 10.x.x.x).
       Waiting for SSH service on 10.x.x.x:22, retrying in 3 seconds
       WARNING: Your password has expired.
       You must change your password now and login again!
       Changing password for user cloud.

The contents of our user_data script is

#!/bin/sh
chage -d $(date +%Y-%m-%d) cloud 
passwd --delete cloud
cheeseplus commented 6 years ago

How are you telling kitchen what user to use? I don't see any transport settings in any of the snippets provided. Also would be good to get updated debug output, as we can then confirm the user_data via a base64 decode.

mlcooper commented 6 years ago

Sorry I omitted the transport settings we have in place:

platforms:
  - name: rhel7
    driver:
      image_id: ami-xxxx
      user_data: ./cloud-init-script.sh
    transport:
      username: cloud
      # your private key for your AWS Key Pair (Network & Security --> Key Pairs)
      ssh_key: <%= ENV['AWS_KEYFILE'] %>
      name: rsync
      connection_timeout: 10
      connection_retries: 5

New debug output from today:

D      Creating EC2 instance in region us-west-2 with properties:
D      - instance_type = "t2.micro"
D      - ebs_optimized = false
D      - image_id = "ami-xxxxxxxxx"
D      - key_name = "<redacted>"
D      - private_ip_address = nil
D      - placement = {:availability_zone=>"us-west-2c", :tenancy=>"default"}
D      - user_data = "Li9jbG91ZC1pbml0LXNjcmlwdC5zaA==\n"
D      - network_interfaces = [{:device_index=>0, :associate_public_ip_address=>false, :delete_on_termination=>true, :subnet_id=>"subnet-xxxxxxx", :groups=>["sg-xxxxxxx", "sg-xxxxxxxx"]}]
cheeseplus commented 6 years ago

The base64 decodes to ./cloud-init-script.sh which means it's not recognizing that as a file, instead literally taking it as the string. Try cloud-init-script.sh and provide the user_data debug (or decode yourself https://www.base64decode.org/).

The other thing that stands out is the transport being used is rsync which is something we don't actively support.

mlcooper commented 6 years ago

In the debug output are you expecting to see the contents of the file, or just cloud-init-script.sh? I changed the settings to:

 - name: rhel7
    driver:
      image_id: ami-xxxxxx
      user_data: cloud-init-script.sh
    transport:
      username: cloud
      # your private key for your AWS Key Pair (Network & Security --> Key Pairs)
      ssh_key: <%= ENV['AWS_KEYFILE'] %>
      name: sftp
      connection_timeout: 10
      connection_retries: 5

And I decoded the user_data and it appears to have an extra character ' in what got decoded.

D      Creating EC2 instance in region us-west-2 with properties:
D      - instance_type = "t2.micro"
D      - ebs_optimized = false
D      - image_id = "ami-xxxxx"
D      - key_name = "<redacted>"
D      - private_ip_address = nil
D      - placement = {:availability_zone=>"us-west-2c", :tenancy=>"default"}
D      - user_data = "Y2xvdWQtaW5pdC1zY3JpcHQuc2g=\n"
D      - network_interfaces = [{:device_index=>0, :associate_public_ip_address=>false, :delete_on_termination=>true, :subnet_id=>"subnet-xxxx", :groups=>["sg-xxxx", "sg-xxxx"]}]
cheeseplus commented 6 years ago

The base64 should be the content of the file, I don't think using the sftp transport would change anything but @coderanger might know if that's related. Need to do some more testing and try to narrow down a repro - encoding is always fickle so I'm walking through changes to double check things.

mlcooper commented 6 years ago

Good to know...with all of the combinations I have tired, I have never seen the decoded value of the base64 be the content of the file. I've only seen it be the literal string (path to the file/filename itself).

cheeseplus commented 6 years ago

Also that site I linked might not actually be helping, the ideal way to decode is using the same Ruby lib. minding the double quotes:

[3] pry(main)> Base64.decode64("IyEgL2Jpbi9iYXNoCmVjaG8gInVzZXIgZGF0YSBzY3JpcHQgcmFuIiA+PiAv\ncm9vdC9sb2cubG9n\n")
=> "#! /bin/bash\necho \"user data script ran\" >> /root/log.log"

I also tried using the same file content and name as your examples, so perhaps something about that specific file is strange?

[2] pry(main)> Base64.decode64("IyEvYmluL3NoCmNoYWdlIC1kICQoZGF0ZSArJVktJW0tJWQpIGNsb3VkIApw\nYXNzd2QgLS1kZWxldGUgY2xvdWQ=\n")
=> "#!/bin/sh\nchage -d $(date +%Y-%m-%d) cloud \npasswd --delete cloud"
mlcooper commented 6 years ago

You're right, with the ruby library the extra ' is not there. Decoding of the last debug output I posted above.

irb(main):003:0> Base64.decode64("Y2xvdWQtaW5pdC1zY3JpcHQuc2g=\n")
=> "cloud-init-script.sh"
cheeseplus commented 6 years ago

Is this a real file and not a symlink? Still thinking it's something about this specific file and that'll be critical in adding a better check/error in our code.

mlcooper commented 6 years ago

Correct - it's a real file:

[user@localhost kitchen]$ ll -a
total 20
drwxrwxr-x 3 user user 4096 Jan 29 10:00 .
drwxrwxr-x 4 user user 4096 Jan 24 13:15 ..
-rw-rw-r-- 1 user user   76 Jan 24 17:20 cloud-init-script.sh
drwxrwxr-x 3 user user 4096 Jan 29 10:00 .kitchen
-rw-rw-r-- 1 user user 2487 Jan 30 10:09 .kitchen.ec2.yml

The .kitchen.ec2.yml file is the one that has this config in it:

  - name: rhel7
    driver:
      image_id: ami-xxxxx
      user_data: cloud-init-script.sh
    transport:
      username: cloud
      # your private key for your AWS Key Pair (Network & Security --> Key Pairs)
      ssh_key: <%= ENV['AWS_KEYFILE'] %>
      name: sftp
      connection_timeout: 10
      connection_retries: 5
coderanger commented 6 years ago

Where are you running the kitchen executable from? The path would be relative to the working directory when things are run, I think, not relative to the config file.

mlcooper commented 6 years ago
[user@localhost rds]$ which kitchen
/opt/chefdk/embedded/bin/kitchen

When I run a command like kitchen test.... I do it from the root level of the cookbook I am testing.

espoelstra commented 6 years ago

@mlcooper Previously you stated this was in your .kitchen.yml, but the last example you pasted has it in your .kitchen.ec2.yml, are you setting KITCHEN_YAML=.kitchen.ec2.yml in your call to kitchen test?

Does it behave any differently if you do a chef exec kitchen test ?

youness-teimoury commented 6 years ago

This worked for me (imagine the ec2_startup_script.txt script is under the same path); user_data: Base64.encode64(File.open(File.join(File.dirname(__FILE__),"ec2_startup_script.txt"), "rb").read)

Also, please don't forget to require 'base64'