Resource "nomad_external_volume" destruction fails because volume is still not detached

Puskal07 commented 1 year ago

Terraform Version

1.3.6

Explanation

I have Portworx installed in my nomad cluster to manage the volumes and the portworx plugin also. So I'm using Portworx volumes to provision CSI volumes to nomad job. Here I'm using the following data source block to connect to my Portworx plugin in my nomad cluster.

data "nomad_plugin" "pwx" {
  plugin_id            = "portworx"
  wait_for_healthy = true
}

Now I'm declaring a resource block to create a CSI volume using the Portworx plugin with the following configurations.

resource "nomad_external_volume" "cbs_acme_nomad_csi_volume" {
  depends_on    = [data.nomad_plugin.pwx]
  namespace     = var.namespace
  type                = "csi"
  plugin_id        = "portworx"
  volume_id      = "my-volume"
  name              = "my-volume"
  capacity_min = "5G"
  capacity_max = 10G"
  capability {
    access_mode         = "multi-node-single-writer"
    attachment_mode = "file-system"
  }
  mount_options {
    fs_type = "ext4"
  }
  parameters = {
    io_priority = "high"
    repl           = "2"
  }
}

Finally I am mounting this volume to my nomad job. The nomad job is also being deployed using terraform. The nomad job has an implicit dependency on this volume. While performing terraform destroy the nomad job gets destroyed successfully but this nomad external volume destruction fails showing it's still not detached. But if I add the following provisioner block inside the "nomad_external_volume" resource it works fine.

provisioner "local-exec" {
    command = "sleep 5"
    when    = destroy
  }

But can there be any turnaround if we don't want to include the local-exec provisioner to sleep for 5s while destroying, like after the nomad job gets destroyed terraform should wait for the volume to get detached before performing destroy on the nomad external volume.

Error Output

Error: error deleting volume: Unexpected response code: 500 (rpc error: controller delete volume: CSI.ControllerDeleteVolume: rpc error: code = Aborted desc = Unable to delete volume with id 737164885392674036: rpc error: code = Internal desc = Failed to delete volume 737164885392674036: rpc error: code = Internal desc = Failed to detach volume 737164885392674036: Volume 737164885392674036 is mounted at 2 location(s): /var/lib/osd/pxns/737164885392674036, /var/lib/csi/publish/per-alloc/dffe90c4-40d2-1d38-8b11-8bac08e758b3/my-volume/rw-file-system-multi-node-single-writer)

Expected Behavior

In Terraform destroy stage first it should detach the volume or wait for the volume to get detach and then destroy it, then we shouldn't get any error.

Actual Behavior

In the terraform destroy stage I get the above mentioned error.

To my guess it's happening because the volume is still not detached and terraform tries to destroy it. So can anyone take a look into this issue so that the volume gets detached first then it gets destroyed.

Steps to Reproduce

terraform init
terraform plan
terraform apply
terraform destroy

jbardin commented 1 year ago

Hello,

This appears to be a question or an issue with a provider, not with Terraform itself. You can check existing issues and file a new one in the provider's project repository, linked from the their registry page. If you have questions about Terraform or the provider, it's better to use the community forum where there are more people ready to help. The GitHub issues here are monitored only by a few core maintainers.

Thanks!

github-actions[bot] commented 10 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashicorp / terraform