vultr / terraform-provider-vultr

Terraform Vultr provider
https://www.terraform.io/docs/providers/vultr/
Mozilla Public License 2.0
193 stars 68 forks source link

[BUG] - Inconsistent success attaching block storage -- Instance is locked #482

Open Jugbot opened 7 months ago

Jugbot commented 7 months ago

Describe the bug I am creating a new instance and attaching a block storage device to the instance at the same time.

resource "vultr_instance" "nodejs_server" {
  plan      = var.nodejs_plan
  os_id     = 1743
  region    = var.region
  label     = "nodejs-backend"
  hostname  = var.hostname
  vpc2_ids  = [vultr_vpc2.my_vpc2.id]
  script_id = vultr_startup_script.setup_script.id
  lifecycle {
    replace_triggered_by = [
      terraform_data.always_run
    ]
  }
}

resource "vultr_startup_script" "setup_script" {
  name = "Server Setup"
  type = "boot"
  script = base64encode(templatefile("server_setup.sh", {
    repository_url  = var.REPOSITORY_URL
    mysql_user      = var.DB_USER
    mysql_password  = var.DB_PASSWORD
    mysql_host      = vultr_database.mysql_db.host
    mysql_port      = vultr_database.mysql_db.port
    mysql_db_schema = vultr_database_db.my_database_db.name
    email_name      = var.EMAIL_NAME
    email_password  = var.EMAIL_PASSWORD
    nginx_config    = file("nginx.conf")
  }))
}

resource "terraform_data" "always_run" {
  input = timestamp()
}

resource "vultr_dns_domain" "my_domain" {
  domain = var.hostname
  ip     = vultr_instance.nodejs_server.main_ip
}

# Block storage is for storing ssl certificates only
# This was implemented to circumvent rate limiting on certificate requests
resource "vultr_block_storage" "my_block_storage" {
  attached_to_instance = vultr_instance.nodejs_server.id
  region               = var.region
  block_type           = "high_perf"
  size_gb              = 1
  lifecycle {
    prevent_destroy = true
  }
}

I get this error

Error: error attaching block storage (9e616948-a562-47d4-8876-3fe205d4fb3d): {"error":"unable to attach: Server is currently locked","status":400} with vultr_block_storage.my_block_storage on vultr_instance.tf line 43, in resource "vultr_block_storage" "my_block_storage": resource "vultr_block_storage" "my_block_storage" {

To Reproduce I believe only the simultaneous creation of an instance and attaching a block storage device to that instance is relevant.

This sometimes succeeds, however.

Expected behavior The attach operation should wait until the server is ready (not locked)

Desktop (please complete the following information where applicable:

thehonker commented 3 months ago

Also seeing this on OpenTofu v1.8.1 with registry.terraform.io/vultr/vultr v2.21.0, I tried adding the provisioner stanzas shown below as a workaround which makes it somewhat more reliable, but we often end up needing to reapply after a short delay - I feel the workaround is only acting as a delay and isn't long enough sometimes.

resource "vultr_instance" "control_plane_instance" {
  depends_on = [ random_id.control_plane_node_id, vultr_vpc2.vpc2 ]

  for_each = { for i, v in random_id.control_plane_node_id: i => v }

  plan = var.CONTROL_PLANE_VM_PLAN
  region = var.REGION
  os_id = var.OS_ID
  label = "${var.CLUSTER_ID}-control-plane-${each.value.hex}"
  hostname = "${each.value.hex}"
  backups = "disabled"
  firewall_group_id = var.FIREWALL_GROUP_ID
  tags = ["${var.CLUSTER_ID}-control-plane"]
  ssh_key_ids = [var.SSH_KEY_IDS]
  enable_ipv6 = true

  provisioner "local-exec" {
    command = "until ping -c1 ${self.main_ip} >/dev/null 2>&1; do sleep 5; done;"
  }

  provisioner "remote-exec" {
    connection {
      host = self.main_ip
      user = "root"
      private_key = file("~/.ssh/id_ed25519.devenv")
    }
    inline = ["echo 'connected!'"]
  }
}

EDIT - After adding another sleep, our issue now appears to be that the server is locked when attaching multiple blockstorage in one shot.

# Pause for 120s to allow all servers to become unlocked
resource "time_sleep" "wait_120_seconds" {
  create_duration = "120s"
  destroy_duration = "120s"
}

# Provision and attach blockstorage
# Blockstorage for k8s-internal ceph cluster on control plane nodes
resource "vultr_block_storage" "control_plane_instance" {
  depends_on = [ time_sleep.wait_120_seconds, vultr_instance.control_plane_instance ]

  count = length(vultr_instance.control_plane_instance) * var.CONTROL_PLANE_CEPH_BLOCK_COUNT
  label = "${vultr_instance.control_plane_instance[floor(count.index / var.CONTROL_PLANE_CEPH_BLOCK_COUNT)].label}"
  size_gb = var.CONTROL_PLANE_CEPH_BLOCK_SIZE
  region = var.REGION
  attached_to_instance = vultr_instance.control_plane_instance[floor(count.index / var.CONTROL_PLANE_CEPH_BLOCK_COUNT)].id
  block_type = var.BLOCK_TYPE
  live = true
}

Thus I think this is a more general issue centered around server lock status, and not a simple race condition on sub creation.