hashicorp / packer

Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.
http://www.packer.io
Other
15.03k stars 3.32k forks source link

Issue with SSH+IAP in `googlecompute` builder with Ubuntu images. #12169

Open vicvi1997 opened 1 year ago

vicvi1997 commented 1 year ago

Hello,

We are running a Bash Script that runs Packer to build a GCE image using the googlecompute builder. It was running fine when the property source_image_family was set to centos-7 and later to rocky-linux-8-optimized-gcp, but when we changed it to ubuntu-2204-lts it stopped working. I'm attaching the details below:

The Packer template is the following:

{
  "variables": {
    "env": "{{env `TABLEAU_ENV`}}",
    "project_id": "{{env `TABLEAU_PROJECT_ID`}}",
    "tableau_version": "{{env `TABLEAU_VERSION`}}",
    "region": "{{env `TABLEAU_REGION`}}",
    "zone": "{{env `TABLEAU_ZONE`}}",
    "subnetwork": "{{env `TABLEAU_SUBNETWORK`}}",
    "tableau_user": "{{env `TABLEAU_USER`}}",
    "source_image_family": "{{env `PACKER_SOURCE_IMAGE_FAMILY`}}",
    "image_family": "{{env `PACKER_IMAGE_FAMILY`}}",
    "image_name": "{{env `PACKER_IMAGE_NAME`}}",
    "machine_type": "{{env `PACKER_MACHINE_TYPE`}}"
  },
  "builders": [
    {
      "type": "googlecompute",
      "project_id": "{{user `project_id`}}",
      "region": "{{user `region`}}",
      "zone": "{{user `zone`}}",
      "subnetwork": "projects/{{user `project_id`}}/regions/{{user `region`}}/subnetworks/{{user `subnetwork`}}",
      "source_image_family": "{{user `source_image_family`}}",
      "image_family": "{{user `image_family`}}",
      "image_name": "{{user `image_name`}}",
      "machine_type": "{{user `machine_type`}}",
      "ssh_username": "{{user `tableau_user`}}",
      "use_iap": true,
      "instance_name": "packer-{{uuid}}"
    }
  ],
  "provisioners": [
    {
       "type": "file",
       "source": "scripts.tgz",
       "destination": "/home/{{user `tableau_user`}}/scripts.tgz"
    },
    {
      "type": "shell",
      "inline_shebang": "/usr/bin/bash -e",
      "inline": [
        "cd /home/{{user `tableau_user`}}",
        "tar -zxvf scripts.tgz",
        "rm -f scripts.tgz",
        "cd"
      ]
    }
  ]
}

The following is the output of the packer validate + packer build commands:

The configuration is valid.
Packer template is valid.
Debug mode enabled. Builds will not be parallelized.
googlecompute: output will be in this color.

==> googlecompute: Checking image does not exist...
==> googlecompute: Creating temporary RSA SSH key for instance...
    googlecompute: Saving key for debug purposes: gce_googlecompute.pem
==> googlecompute: Using image: ubuntu-2204-jammy-v20221206
==> googlecompute: Creating instance...
    googlecompute: Loading zone: europe-west1-d
    googlecompute: Loading machine type: e2-medium
    googlecompute: Requesting instance creation...
    googlecompute: Waiting for creation operation to complete...
    googlecompute: Instance has been created!
    googlecompute: Instance: packer-63a02fc3-ac2c-b50b-351b-9b65220f4348 started in europe-west1-d
==> googlecompute: Waiting for the instance to become running...
    googlecompute: Public IP: 35.205.110.157
    googlecompute: IP: 35.205.110.157
==> googlecompute: Step Launch IAP Tunnel...
==> googlecompute: Using SSH communicator to connect: localhost
==> googlecompute: Waiting for SSH to become available...
==> googlecompute: Error waiting for SSH: Packer experienced an authentication error when trying to connect via SSH. This can happen if your username/password are wrong. You may want to double-check your credentials as part of your debugging process. original error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
==> googlecompute: Deleting instance...
    googlecompute: Instance has been deleted!
==> googlecompute: Deleting disk...
    googlecompute: Disk has been deleted!
Build 'googlecompute' errored after 3 minutes 18 seconds: Packer experienced an authentication error when trying to connect via SSH. This can happen if your username/password are wrong. You may want to double-check your credentials as part of your debugging process. original error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

==> Wait completed after 3 minutes 18 seconds

==> Some builds didn't complete successfully and had errors:
--> googlecompute: Packer experienced an authentication error when trying to connect via SSH. This can happen if your username/password are wrong. You may want to double-check your credentials as part of your debugging process. original error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

==> Builds finished but no artifacts were created.
GCE image failed to build.
ABORTING

The error seems to be related to the property "use_iap": true,. We need to use IAP in our use case so we cannot turn that property OFF. There might be a problem with the IAP functionality on Ubuntu images that is not present on RHEL-based ones.

The issue should be easily reproduceable with the information provided, but let me know if more information is needed from my side.

cmcga1125 commented 1 year ago

I can confirm the same behavior - I was trying to use Cloud Build to create an image based on ubuntu-2204. it has some simple scripts to add software, and it continually failed when trying to use IAP to SSH to the box. found this issue and adjusted to 2004-lts and it works fine

==> googlecompute: Error waiting for SSH: Packer experienced an authentication error when trying to connect via SSH. This can happen if your username/password are wrong. You may want to double-check your credentials as part of your debugging process. original error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
javierlga commented 1 year ago

In my case, the the problem is that the temporary script that sets the tunnel was being removed during the initial bootstrapping process of the tunnel. I was looking at the code and the gcloud-setup script contains the gcloud compute start-iap-tunnel , I executed the following commands in my terminal:

watch 'ps aux | grep gcloud-setup'
watch 'ps aux | grep tunnel'

That's when I realized about the race condition, not sure if it's because of NewTunnelDriver.

After looking at the documentation, I realized that the option iap_tunnel_launch_wait increases the timeout in seconds before marking the tunnel creation as successful, setting it to 90 fixed the problem.