vultr / terraform-provider-vultr

Terraform Vultr provider
https://www.terraform.io/docs/providers/vultr/
Mozilla Public License 2.0
190 stars 65 forks source link

[BUG] - Instance creation may complete before init scripts are finished #440

Open HartS opened 7 months ago

HartS commented 7 months ago

I'm actually using Pulumi, but https://github.com/dirien/pulumi-vultr appears to be largely generated from the Vultr terraform provider.

I noticed with Ubuntu 22.04 that when I pulumi up with a user-data script that installs docker, the installation is still in progress when the command completes, the server_status is installingbooting, and docker is unavailable.

To Reproduce

With Pulumi, set the vultr:apiKey and privateKeyFile config, and pulumi up with the following Pulumi.yaml:

name: repro
description: repro the issue with not waiting for server_status=ok
runtime: yaml
template:
  description: Vultr API credentials
  config:
    vultr:apiKey:
      secret: true
resources:
  publicKey:
    type: command:local:Command
    properties:
      create: "ssh-keygen -yf ${privateKeyFile}"
  privateKey:
    type: command:local:Command
    properties:
      create: "cat ${privateKeyFile}"
    options:
      additionalSecretOutputs:
      - stdout
  sshkey:
    type: vultr:SSHKey
    properties:
      name: Main
      sshKey: ${publicKey.stdout}
    options:
      protect: true
  dev:
    type: vultr:Instance
    properties:
      # Ubuntu 22.04 x64
      osId: 1743
      plan: vhp-2c-4gb-amd
      region: sea
      sshKeyIds:
      - ${sshkey.id}
      backups: "disabled"
      enableIpv6: true
      hostname: jukejam-dev
      userData:
        fn::readFile: "./setup.sh"
  dockerPsOutput:
    type: command:remote:Command
    properties:
      connection:
        host: ${dev.mainIp}
        user: ubuntu
        privateKey: ${privateKey.stdout}
      create: "docker ps"

and setup.sh (which runs with cloud-init)

#!/usr/bin/env bash

cat << 'EOF' > /etc/sudoers.d/90-cloudimg-ubuntu
# ubuntu user is default user in cloud-images.
# It needs passwordless sudo functionality.
ubuntu ALL=(ALL) NOPASSWD:ALL
EOF

cat ~/.ssh/authorized_keys >> /home/ubuntu/.ssh/authorized_keys

# Install docker
sudo apt-get -y update
sudo apt-get -y install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get -y update
sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Allow ubuntu user to run docker without sudo:
gpasswd -a ubuntu docker

Expected behavior The dockerPsOutput output should contain "CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES" (the header from running docker ps)

Desktop (please complete the following information where applicable:

Additional context

For reference, I added the following resource after dev:

  # Check the Vultr API until server_status is 'ok'
  devIsReady:
    options:
      dependsOn:
      - ${dev}
    type: command:local:Command
    properties:
      create: "while [[ $(curl -s https://api.vultr.com/v2/instances/${dev.id} -H 'Authorization: Bearer ${vultr:apiKey}' | jq -r .instance.server_status) != 'ok' ]]; do sleep 1; done"

and modified the dockerPsOutput resource:

    options:
      dependsOn:
      - ${devIsReady}

With the above changes, the upgrade now waits for server_status=OK, and the next step succeeds (however, it does introduce a ~7.5 minute delay, as the server status takes a while to transition out of installingbooting... this seems like a separate issue on Vultr's end)

Ideally there would be a way to configure the terraform provisioning to have it wait until cloud-init user scripts are finished; as a workaround, a later step can be added that runs cloud-init status --wait

HartS commented 7 months ago

Note: the reason I highlight waiting on server_status=OK is because it can be trivially waited on in resource_vultr_instance.go using the waitForServerAvailable function defined there. See https://github.com/vultr/terraform-provider-vultr/compare/master...HartS:terraform-provider-vultr:master

Given the extremely long wait time for server_status to transition to ok (compared to cloud-init status --wait which introduces a much more reasonable delay) I suspect this isn't currently the right approach

optik-aper commented 6 months ago

@HartS Are you able to reproduce the issue when using the terraform provider directly? I just did a quick test that docker was installed after using your script in the userdata like so

resource "vultr_instance" "inst" {
  region = "mel"
  plan = "vc2-2c-4gb"
  label = "tf-ud-test"
  os_id = 1743
  tags = ["tf"]
  user_data = file("~/dump/setup.sh")
}

Where ~/dump/setup.sh is your script. After which, SSHing in and checking that docker is installed shows: image

Can you check the user data from my.vultr.com to verify that the script is there in plaintext?

image