hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io
Other
43.07k stars 9.57k forks source link

Resource state not refreshed before destroy #35568

Open jooola opened 3 months ago

jooola commented 3 months ago

We maintain a terraform provider that is failing to delete a "primary IP" resource in certain circumstances.

Before we can delete a primary IP resource, we have to unassign it from a server instance in our API.

During our tests of the delete call, we noticed that the state of the primary IP resource references an invalid server instance. This led to try to unassign the primary IP from an instance that was not assigned. Which made the primary IP resource deletion fail.

That bug is only happening in terraform >=1.9, so we ran a git bisect from the last known terraform version to work (1.8.5). We found the problematic commit to be 460c7f3933115c3edf670caacd2ffa489ef4eeb8 https://github.com/hashicorp/terraform/pull/35467

Reverting that commit on top of the v1.9 branch fixes our issue.

Terraform Version

Terraform v1.9.4
on linux_amd64

Terraform Configuration Files

On https://github.com/hetznercloud/terraform-provider-hcloud/tree/tf-1.9-primary-ip-delete

Debug Output

TF_ACC=1 go test ./internal/server -run TestServerResource_PrimaryIPTests -v -timeout=30m -parallel=8
=== RUN   TestServerResource_PrimaryIPTests
    resource_test.go:908: 

        HCL:
            1: resource "hcloud_ssh_key" "server" {
            2:   name        = "server--665592047586110587"
            3:   public_key  = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCs9urdjWxJqCVELlPhuQBrCsTJ0XdF3j2+VETM59knH2crDtaAK2omx1cX5MFwjHKEea9fZEel/w0Vj4NhfgsGMe7JSr4Cj5bubcVGI7rJ12Ohl3QFLOZ6azwd13gT6K6o2g3OZtGQagxso4u4BOp9KyLy6wcxO4DVDhyt2Le38w== hcloud@ssh-acceptance-test"
            4:     labels = {
            5:       key = "4999487623275307476"
            6:   }
            7: }
            8: 
            9: resource "hcloud_primary_ip" "primary-ip-v4" {
           10:   name        = "primaryip-v4-test--665592047586110587"
           11:   type = "ipv4"
           12:   datacenter       = "nbg1-dc3"
           13:   
           14:   assignee_type       = "server"
           15:   
           16:   auto_delete       = false
           17: }
           18: 
           19: resource "hcloud_server" "server-primaryIP-test" {
           20:   name        = "server-primaryIP-test--665592047586110587"
           21:   server_type = "cpx11"
           22:   image       = "ubuntu-24.04"
           23:   datacenter  = "nbg1-dc3"
           24:   
           25:   ssh_keys    = [hcloud_ssh_key.server.id]
           26:   
           27:   public_net {
           28:     ipv4 = hcloud_primary_ip.primary-ip-v4.id
           29:     ipv4_enabled = true
           30:     ipv6_enabled = false
           31:   }
           32:   
           33: 
           34:   
           35: }

    resource_test.go:927: 

        HCL:
            1: resource "hcloud_ssh_key" "server" {
            2:   name        = "server--665592047586110587"
            3:   public_key  = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCs9urdjWxJqCVELlPhuQBrCsTJ0XdF3j2+VETM59knH2crDtaAK2omx1cX5MFwjHKEea9fZEel/w0Vj4NhfgsGMe7JSr4Cj5bubcVGI7rJ12Ohl3QFLOZ6azwd13gT6K6o2g3OZtGQagxso4u4BOp9KyLy6wcxO4DVDhyt2Le38w== hcloud@ssh-acceptance-test"
            4:     labels = {
            5:       key = "4999487623275307476"
            6:   }
            7: }
            8: 
            9: resource "hcloud_primary_ip" "primary-ip-v4" {
           10:   name        = "primaryip-v4-test--665592047586110587"
           11:   type = "ipv4"
           12:   datacenter       = "nbg1-dc3"
           13:   
           14:   assignee_type       = "server"
           15:   
           16:   auto_delete       = false
           17: }
           18: 
           19: resource "hcloud_server" "server-primaryIP-test" {
           20:   name        = "server-primaryIP-test--665592047586110587"
           21:   server_type = "cpx11"
           22:   image       = "ubuntu-24.04"
           23:   datacenter  = "nbg1-dc3"
           24:   
           25:   ssh_keys    = [hcloud_ssh_key.server.id]
           26:   
           27: 
           28:   
           29: }

    resource_test.go:946: 

        HCL:
            1: resource "hcloud_ssh_key" "server" {
            2:   name        = "server--665592047586110587"
            3:   public_key  = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCs9urdjWxJqCVELlPhuQBrCsTJ0XdF3j2+VETM59knH2crDtaAK2omx1cX5MFwjHKEea9fZEel/w0Vj4NhfgsGMe7JSr4Cj5bubcVGI7rJ12Ohl3QFLOZ6azwd13gT6K6o2g3OZtGQagxso4u4BOp9KyLy6wcxO4DVDhyt2Le38w== hcloud@ssh-acceptance-test"
            4:     labels = {
            5:       key = "4999487623275307476"
            6:   }
            7: }
            8: 
            9: resource "hcloud_primary_ip" "primary-ip-v6" {
           10:   name        = "primaryip-v6-test--665592047586110587"
           11:   type = "ipv6"
           12:   datacenter       = "nbg1-dc3"
           13:   
           14:   assignee_type       = "server"
           15:   
           16:   auto_delete       = false
           17: }
           18: 
           19: resource "hcloud_server" "server-primaryIP-test" {
           20:   name        = "server-primaryIP-test--665592047586110587"
           21:   server_type = "cpx11"
           22:   image       = "ubuntu-24.04"
           23:   datacenter  = "nbg1-dc3"
           24:   
           25:   ssh_keys    = [hcloud_ssh_key.server.id]
           26:   
           27:   public_net {
           28:     ipv4_enabled = false
           29:     ipv6 = hcloud_primary_ip.primary-ip-v6.id
           30:     ipv6_enabled = true
           31:   }
           32:   
           33: 
           34:   
           35: }

=== PAUSE TestServerResource_PrimaryIPTests
=== CONT  TestServerResource_PrimaryIPTests
    resource_test.go:901: Step 3/3 error: Error running apply: exit status 1

        Error: unexpected assignee id: state_assignee_id=51802107 primary_ip_id=67207997 server_ipv4_id=67208025 server_ipv6_id=67208029

--- FAIL: TestServerResource_PrimaryIPTests (77.92s)
FAIL
FAIL    github.com/hetznercloud/terraform-provider-hcloud/internal/server   77.934s
FAIL
make: *** [GNUmakefile:65: testacc] Error 1

Traces: https://github.com/hetznercloud/terraform-provider-hcloud/blob/tf-1.9-primary-ip-delete/debug/test-TestServerResource_PrimaryIPTests.log

Actual Behavior

The state of the primary IP resource references an invalid server instance. Which led to try to unassign the primary IP from an instance that was not assigned. Which made the primary IP resource deletion fail.

Expected Behavior

We expect the state of the primary IP resource to be refreshed before we can destroy it.

Steps to Reproduce

git clone --branch tf-1.9-primary-ip-delete https://github.com/hetznercloud/terraform-provider-hcloud
cd terraform-provider-hcloud

export TF_LOG="trace"
export TF_LOG_PATH_MASK="test-%s.log"
make TEST="./internal/server" TESTARGS="-run TestServerResource_PrimaryIPTests" testacc

Additional Context

References

jbardin commented 3 months ago

Hi @jooola,

Thanks for filing the issue. I have not been able to replicate the issue from your config yet, and unfortunately the logs don't have any of the core trace output either. I'm wondering if this is something specific to the acceptance tests themselves, can you run a standalone example from the command line? (that would also give easier access to the core logs, especially if you use TF_LOG_CORE=trace to avoid all the plugin logging).

jooola commented 3 months ago

Sadly, we are having some trouble to make this reproducible outside our provider.

We were able to reproduce the bug only running from the CLI.

I pushed the core trace logs and the backed up states for each steps in our debug branch. The failing step is the step 5 https://github.com/hetznercloud/terraform-provider-hcloud/blob/tf-1.9-primary-ip-delete/debug/step5/step5.log

The error is ocurring at this line: https://github.com/hetznercloud/terraform-provider-hcloud/blob/tf-1.9-primary-ip-delete/debug/step5/step5.log#L1300

The previous steps can be found in https://github.com/hetznercloud/terraform-provider-hcloud/blob/tf-1.9-primary-ip-delete/debug/

To re-state our problem, we are running a delete resource using a state that is not up-to-date. Previously, the state was refreshed using the read resource call before performing the delete resource call.

We are not sure if it is now our responsibility to pull fresh data ourselves in the delete call, or if terraform is the one responsible for refreshing the state before deleting the resource.