lxc / terraform-provider-incus

Incus provider for Terraform/OpenTofu
https://linuxcontainers.org/incus
Mozilla Public License 2.0
35 stars 8 forks source link

Bug: race condition with using for_each in incus_resource #93

Open tregubovav-dev opened 1 week ago

tregubovav-dev commented 1 week ago

Issue

Race condition may appear if multiple instances being created with on different target nodes in cluster using the same image name. Some instances creation fails with error: Failed instance creation: Failed creating image record: Failed saving main image record: UNIQUE constraint failed: images.project_id, images.fingerprint.

Steps to reproduce

  1. Deploy incus cluster with 3+ nodes and use shared storage (I use 7's nodes RPI4cluster with ceph storage).
  2. Deploy configuration below
locals {
instances = {
  "inst-01" = {
    target = "node-01"
  },
  "inst-02" = {
    target = "node-02"
  },
  "inst-03" = {
    target = "node-03"
  },
  "inst-04" = {
    target = "node-04"
  },
  "inst-05" = {
    target = "node-05"
  }
}
}

resource "incus_instance" "app_dns_instance" {
  for_each = local.instances

  project = "test"
  image = "images:alpine/edge"
  target = each.value.target
  name = each.key
  wait_for_network = false

  device {
    type = "disk"
    name = "root"
    properties = {
      path = "/"
      pool ="remote"
    }
  }        
}

In my environment two-to-four instance deployments fails with error: Failed instance creation: Failed creating image record: Failed saving main image record: UNIQUE constraint failed: images.project_id, images.fingerprint.

Below is the worsest case output where image was deleted before it cloned for inst-03 root device:

Plan: 5 to add, 0 to change, 0 to destroy.
incus_instance.app_dns_instance["inst-03"]: Creating...
incus_instance.app_dns_instance["inst-02"]: Creating...
incus_instance.app_dns_instance["inst-01"]: Creating...
incus_instance.app_dns_instance["inst-04"]: Creating...
incus_instance.app_dns_instance["inst-05"]: Creating...
╷
│ Error: Failed to create instance "inst-05"
│
│   with incus_instance.app_dns_instance["inst-05"],
│   on main.tf line 21, in resource "incus_instance" "app_dns_instance":
│   21: resource "incus_instance" "app_dns_instance" {
│
│ Failed instance creation: Failed creating image record: Failed saving main image record: UNIQUE constraint failed: images.project_id, images.fingerprint
╵
╷
│ Error: Failed to create instance "inst-04"
│
│   with incus_instance.app_dns_instance["inst-04"],
│   on main.tf line 21, in resource "incus_instance" "app_dns_instance":
│   21: resource "incus_instance" "app_dns_instance" {
│
│ Failed instance creation: Failed creating image record: Failed saving main image record: UNIQUE constraint failed: images.project_id, images.fingerprint
╵
╷
│ Error: Failed to create instance "inst-03"
│
│   with incus_instance.app_dns_instance["inst-03"],
│   on main.tf line 21, in resource "incus_instance" "app_dns_instance":
│   21: resource "incus_instance" "app_dns_instance" {
│
│ Failed instance creation: Failed creating instance from image: Failed to run: rbd --id admin --cluster ceph --image-feature layering clone
│ lxd/image_45ec164abe54425db3622fada8f2bd639313efff8cdc14298a9d16cbab0dd835_ext4@readonly lxd/container_test_inst-03: exit status 2 (2024-06-27T14:10:19.732-0700 ffff82a0a3c0 -1
│ librbd::image::OpenRequest: failed to find snapshot readonly
│ 2024-06-27T14:10:19.733-0700 ffff7560a3c0 -1 librbd::image::CloneRequest: 0xaaaaf0c92640 handle_open_parent: failed to open parent image: (2) No such file or directory
│ rbd: clone error: (2) No such file or directory)
╵
╷
│ Error: Failed to create instance "inst-02"
│
│   with incus_instance.app_dns_instance["inst-02"],
│   on main.tf line 21, in resource "incus_instance" "app_dns_instance":
│   21: resource "incus_instance" "app_dns_instance" {
│
│ Failed instance creation: Failed creating image record: Failed saving main image record: UNIQUE constraint failed: images.project_id, images.fingerprint
╵
╷
│ Error: Failed to create instance "inst-01"
│
│   with incus_instance.app_dns_instance["inst-01"],
│   on main.tf line 21, in resource "incus_instance" "app_dns_instance":
│   21: resource "incus_instance" "app_dns_instance" {
│
│ Failed instance creation: Failed creating instance from image: Error inserting volume "45ec164abe54425db3622fada8f2bd639313efff8cdc14298a9d16cbab0dd835" for project "default" in pool
│ "remote" of type "images" into database "UNIQUE constraint failed: index 'storage_volumes_unique_storage_pool_id_node_id_project_id_name_type'"
stgraber commented 1 week ago

That's an Incus issue more than an issue with the provider but that should make it pretty easy to reproduce.