hashicorp / packer-plugin-openstack

Packer plugin for OpenStack Builder
https://www.packer.io/docs/builders/openstack
Mozilla Public License 2.0
20 stars 19 forks source link

Build fails with error "Resource not found" after Packer terminates the source server #105

Closed MuriloKomirchuk closed 8 months ago

MuriloKomirchuk commented 1 year ago

Overview of the Issue

In v1.1.1 build fails just after Packer terminates the source server. The server is terminated but Packer returns the error "Error getting server to terminate: Resource not found".

Reproduction Steps

Simply run a build.

Plugin and Packer version

Packer version: 1.8.6 packer-plugin-openstack: v1.1.1

Simplified Packer Buildfile

This buildfile uses block storage volume but I have also tested with no volume and got the same result.

packer {
  required_plugins {
    openstack = {
      version = "1.1.1"
      source = "github.com/hashicorp/openstack"
    }
  }
}

source "openstack" "ol8" {
  username                 = "OS_USERNAME"
  password                 = "OS_PASSWORD"
  identity_endpoint        = "OS_AUTH_URL"
  tenant_id                = "OS_TENANT_ID"
  domain_id                = "OS_DOMAIN_ID"
  region                   = "OS_REGION_NAME"
  insecure                 = true
  source_image             = "xxxxxxxxxxxx"
  flavor                   = "xxxxxxxxxxx"
  config_drive             = true
  networks                 = local.networks
  use_blockstorage_volume  = true
  volume_size              = 20
  image_disk_format        = "qcow2"
  image_name               = "test"
  image_visibility         = "shared"
  image_min_disk           = 20
  ssh_username             = "cloud-user"
  instance_name            = local.instance_name
  ssh_read_write_timeout   = "5m"
}

build {
  name = "OL8 Custom Image"
  sources = ["source.openstack.ol8"]

  provisioner "shell" {
    remote_folder = "/home/cloud-user"
    inline = [
      "sudo timedatectl set-timezone UTC",
      "sudo dnf upgrade -y",
      "sudo reboot"
    ]
    expect_disconnect = true
  }

  provisioner "ansible" {
    pause_before       = "60s"
    galaxy_file        = var.galaxy_file
    playbook_file      = var.playbook_file
    use_proxy          = var.use_proxy  
    extra_arguments    = var.extra_arguments
  }
}

Operating system and Environment details

alpine 3.17.2

Log Fragments and crash.log files

Log

drew-viles commented 1 year ago

This needs a bump as it's ongoing and untouched since July.

I'm using it as part of the image builder project over on Kubernetes and it's been working fine until recently. Nothing has changed on the Image builder (I'm the one that contributed the code for the OpenStack building) and one day to the next this started happening. I was using the same version as I was the previous day but suddenly every build now fails with this same error.

It seems to be an issue where it tries to terminate the source server twice because it thinks there was an error.

==> openstack: Stopping server: 09406f96-b222-4eff-8585-01c2655ea4fd ...
    openstack: Waiting for server to stop: 09406f96-b222-4eff-8585-01c2655ea4fd ...
==> openstack: Terminating the source server: 09406f96-b222-4eff-8585-01c2655ea4fd ...     <------ FIRST ATTEMPT HERE IS SUCCESSFUL - SERVER IS DELETED
==> openstack: Creating the image: eck-231019-6f19e1ed
    openstack: Image: fbd968fb-63a3-46df-8dd4-7a2a0484fb10
==> openstack: Waiting for image eck-231019-6f19e1ed (image id: fbd968fb-63a3-46df-8dd4-7a2a0484fb10) to become ready...
==> openstack: Updating image tags to k8s, capi
==> openstack: Updating image visibility to public
==> openstack: Provisioning step had errors: Running the cleanup provisioner, if present... <------ IT THINKS AN ERROR OCCURS HERE EVEN THOUGH IT DOESN'T - IF IT DOES, TELL US WHAT
==> openstack: Deleted temporary floating IP '2d4f1de9-9b6e-47f0-b031-f805ecc2c8e4' (192.168.199.175)
==> openstack: Terminating the source server: 09406f96-b222-4eff-8585-01c2655ea4fd ...      <------ THEN IT TRIES AGAIN
==> openstack: Error terminating server, may still be around: Resource not found
==> openstack: Deleting volume: 1dfc3a05-45bc-485a-b6da-c881a5b2d08d ...
==> openstack: Deleting temporary keypair: packer_653137c3-8ad7-cb9c-bfba-c8e6c6a86c97 ...

I'm going to continue digging on my side to ensure no errors occurred but from what I can see all resources remove successfully and the image builds just fine.

drew-viles commented 1 year ago

It looks like it could be a glance/cinder problem so will test to confirm.

drew-viles commented 1 year ago

Okay, on further investigation it looks likely that there is a problem with the OpenStack side of things as we're seeing this: https://bugs.launchpad.net/kolla-ansible/+bug/1991516

I'm happy that, from my side, this isn't a bug at all but is instead an issue with OpenStack and how glance/cinder are managing the upload from volume.

nywilken commented 8 months ago

Looking at the comments it seems like this might be an upstream issue. Does the fix in #118 help this situation?

drew-viles commented 8 months ago

I shall test it on my end as soon as possible but I actually chatted with John about this a while back, on Slack as we were seeing similar issues so it may well be the fix we're looking for.