hashicorp / packer-plugin-googlecompute

Packer plugin for Google Compute Builder
https://www.packer.io/docs/builders/googlecompute
Mozilla Public License 2.0
23 stars 53 forks source link

GCP, OS Login, and Ansible #79

Open verdverm opened 2 years ago

verdverm commented 2 years ago

Overview of the Issue

I'm trying to enable OS Login on a previously working Packer + Ansible setup.

It appears that when you use the Ansible user task for the user packer is using in the Ansible provisioner, login breaks.

Reproduction Steps

Put the files below in the same directory and run packer build packer.json There are a few variables that you may want to change.

If you remove either of the following... the image will build

Packer version

Packer v1.7.4

Simplified Packer Template

packer.json
{
  "variables": {
    "gcp_project": "ferrum-dev"
  },
  "builders": [
    {
      "image_name": "repro-{{isotime | clean_resource_name}}",
      "image_family": "packer-repro",
      "type": "googlecompute",
      "project_id": "{{ user `gcp_project` }}",
      "source_image": "centos-7-v20211105",
      "ssh_username": "centos",
      "zone": "us-central1-a",

      "machine_type": "n1-standard-2",

      "service_account_email": "devops-bot@{{ user `gcp_project` }}.iam.gserviceaccount.com",
      "scopes": ["https://www.googleapis.com/auth/cloud-platform"],

      "use_internal_ip": true,
      "omit_external_ip": true,
      "metadata": {
        "enable-oslogin": "True"
      },
      "use_os_login": true,

      "network": "projects/{{ user `gcp_project` }}/global/networks/{{ user `gcp_project` }}",
      "subnetwork": "projects/{{ user `gcp_project` }}/regions/us-central1/subnetworks/{{ user `gcp_project` }}-central-subnet"
    }
  ],

  "provisioners": [
    {
      "type": "ansible",
      "user": "centos",
      "ansible_env_vars": [
          "ANSIBLE_HOST_KEY_CHECKING=False",
          "ANSIBLE_SSH_ARGS='-o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s'",
          "ANSIBLE_NOCOLOR=True",
          "ANSIBLE_DEBUG=false",
          "ANSIBLE_VERBOSITY=1"
      ],
      "playbook_file": "./playbook.yml"
    }
  ]

}
playbook.yml
---
- name: Packer OS Login repro
  hosts: all
  become: true
  become_method: sudo
  vars:
    GO_VERSION: "1.17.5"
  tasks:
  - name: Ensure Centos User
    user:
      name: centos
      state: present

  - name: Download Golang
    get_url:
      url: https://dl.google.com/go/go{{ GO_VERSION }}.linux-amd64.tar.gz
      dest: /tmp/go{{ GO_VERSION }}.linux-amd64.tar.gz

Operating system and Environment details

Running from debian-10, against a centos-7 host

Log Fragments and crash.log files

    googlecompute: TASK [Download Golang] *********************************************************
    googlecompute: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to create temporary directory.In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \"/tmp\", for more error information use -vvv. Failed command was: ( umask 77 && mkdir -p \"` echo /home/centos/.ansible/tmp `\"&& mkdir /home/centos/.ansible/tmp/ansible-tmp-1641501772.3634222-603-133097428795282 && echo ansible-tmp-1641501772.3634222-603-133097428795282=\"` echo /home/centos/.ansible/tmp/ansible-tmp-1641501772.3634222-603-133097428795282 `\" ), exited with result 1", "unreachable": true}
nywilken commented 2 years ago

Hi @verdverm thanks for reaching out. Looking at how the os login SSH logic works I suspect that the issue here is the username being provided to the Ansible provisioner.

When setting "use_os_login" to true the GCloud SDK will create or use an SSH key on GCE for authenticating against the instance being provisioned. Usually that username is the name of the service principle or the email address of the user if not using a service principle. Packer is aware of this requirement and will automatically update the value of SSHUsername to match the one associated with service principle. Here is a link to the code.

What that means here is that the username being used is not "centos" but the one obtained from the Os Login API call which is probably something like

"devops-bot@{{ user `gcp_project` }}.iam.gserviceaccount.com"

In order to make provisioners work with the captured SSH username and SSH key Packer will overwrite the contents of the SSHUsername information set in "ssh_username". Since Ansible by default does not use this value I believe the value for "user" needs to match the name of the service principle or alternatively be set to {{build ``User``}}.

If you update the Ansible provisioner block to look like the following are you able to connect?

 {
      "type": "ansible",
      "user": "{{build ``User``}}",
      "ansible_env_vars": [
          "ANSIBLE_HOST_KEY_CHECKING=False",
          "ANSIBLE_SSH_ARGS='-o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s'",
          "ANSIBLE_NOCOLOR=True",
          "ANSIBLE_DEBUG=false",
          "ANSIBLE_VERBOSITY=1"
      ],
      "playbook_file": "./playbook.yml"
    }

*Please note that there are extra back ticks in the build variable assigned to "user" that need to be removed when testing.

verdverm commented 2 years ago

I'm seeing the following error after updating to use the build user

    googlecompute: TASK [Gathering Facts] *********************************************************
    googlecompute: fatal: [default]: FAILED! => {"msg": "template error while templating string: unexpected '.'. String: {{.User}}"}
verdverm commented 2 years ago

Also note, this issue only happens with the centos user, not any other user. In other words, if the user created in ansible is not centos, the build passes, even with the centos user set in all of the packer.json sections

Note, the first task is only showing my identity in the home directory as well.