hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.29k stars 1.72k forks source link

google_compute_instance resource churn when setting network_interface.subnetwork to a subnetwork name #19339

Closed peikk0 closed 1 week ago

peikk0 commented 1 week ago

Community Note

Terraform Version & Provider Version(s)

Terraform v1.9.5 on amd64

Affected Resource(s)

google_compute_instance

Terraform Configuration

resource "google_compute_instance" "my-instance" {
  name = "my-instance"
  ...

  network_interface {
    subnetwork = data.google_compute_subnetwork.my-subnet.name
  }
}

Debug Output

No response

Expected Behavior

No diff for existing google_compute_instance resources.

Actual Behavior

  # google_compute_instance.my-instance will be updated in-place
  ~ resource "google_compute_instance" "my-instance" {
        id                        = "projects/my-project/zones/us-central1-b/instances/my-instance"
        name                      = "my-instance"
        # (23 unchanged attributes hidden)
      ~ network_interface {
            name                        = "nic0"
          ~ subnetwork                  = "https://www.googleapis.com/compute/v1/projects/my-project/regions/us-central1/subnetworks/my-subnet" -> "my-subnet"
            # (9 unchanged attributes hidden)
        }
        # (5 unchanged blocks hidden)
    }

Steps to reproduce

  1. terraform apply
  2. terraform plan

Important Factoids

No diff with v5.42.0, there was a diff too with v5.43.0 though.

References

Subnetwork names are a valid value according to the documentation: https://registry.terraform.io/providers/hashicorp/google/5.43.1/docs/resources/compute_instance#subnetwork

b/364903020

karolgorc commented 1 week ago

The data.google_compute_subnetwork.my-subnet.self_link is using name instead of a self_link as input for some reason.

 ~ subnetwork                  = "https://www.googleapis.com/compute/v1/projects/my-project/regions/us-central1/subnetworks/my-subnet" -> "my-subnet"

Couldn't simply recreate this with something like

data "google_compute_subnetwork" "default" {
  name = "default"
  project = var.project_id
}

resource "google_compute_instance" "default" {
  name         = "test"
  machine_type = "n2-standard-2"
  zone         = var.zone
  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-11"
    }
  }
  network_interface {
    subnetwork = data.google_compute_subnetwork.default.self_link
  }
}

Doing something like this shows all of the expected values

output "data" {
  value = data.google_compute_subnetwork.default.self_link
}

output "instance" {
  value = google_compute_instance.default.network_interface.0.subnetwork
}

Tried to replicate this on newest provider versions and on 5.43.1 and couldn't hit this issue. Juding by this I'd say that this is an issue caused by something in your environment. Most likely the fact that your data block is returing name instead of self_link

Have you tried applying this configuration? If so does the diff persist? Is this a newly created terraform configuration or did it happen while upgrading the provider version? Can you provide a debug log of a terraform run? You can do it by setting a TF_LOG variable to DEBUG

TF_LOG=DEBUG terraform plan
TF_LOG=DEBUG terraform apply
#etc.
peikk0 commented 1 week ago

@karolgorc oh yes you're right, it's actually the other way around (I'll put that on working late πŸ˜‚), we are actually setting it to the name of the subnet (with subnetwork = google_compute_subnetwork.subnetwork.name), it seems to be stored as the resolved self-link in the TF state when applying, and so we get a diff self_link -> name on the next plan again. I'll update the title/description.

Have you tried applying this configuration? If so does the diff persist?

Yes, we get the diff again after applying.

Is this a newly created terraform configuration or did it happen while upgrading the provider version?

It's an existing configuration, it happened when upgrading from v5.42 to v5.43.

Can you provide a debug log of a terraform run? You can do it by setting a TF_LOG variable to DEBUG

I'll look at the debug logs tomorrow. πŸ™πŸ»

karolgorc commented 1 week ago

A quick fix that will get rid of the diff would be to replace google_compute_subnetwork.subnetwork.name with google_compute_subnetwork.subnetwork.self_link. This should produce an empty plan because then the values will be equal.

This happens because the Google API translates the name to self_link outside of terraform when an Instance is created and we suppress that diff here.

"subnetwork": {
  Type:             schema.TypeString,
  Optional:         true,
  Computed:         true,
  DiffSuppressFunc: tpgresource.CompareSelfLinkOrResourceName, // <----- here
  Description:      `The name or self_link of the subnetwork attached to this interface.`,
},

This suppression works when i'm trying to simulate this provider upgrade and produces an empty plan. My only guess for why it doesn't happen on your configuration is that somehow the state got corrupted when upgrading, not really sure tough.

I wonder if you can reproduce this in a sterile environment with a clean configuration. If that quick fix isn't enough for you then providing a clean config with steps to recreate the issue would be ideal πŸ™‚

peikk0 commented 1 week ago

Ok I managed to reproduce it with a simple example, it seems to happen only if you also set network_interface.network_ip = "".

terraform {
  required_version = "~> 1.9"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.43.0"
    }
  }
}

variable "project_id" {
  type    = string
  default = "my-project"
}

variable "region" {
  type    = string
  default = "us-east1"
}

variable "zone" {
  type    = string
  default = "us-east1-b"
}

provider "google" {
  project = var.project_id
}

data "google_compute_subnetwork" "default" {
  name    = "default"
  project = var.project_id
  region  = var.region
}

resource "google_compute_instance" "default" {
  name         = "test"
  machine_type = "n2-standard-2"
  zone         = var.zone
  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-11"
    }
  }
  network_interface {
    subnetwork = data.google_compute_subnetwork.default.name
    network_ip = ""
  }
}
❯ terraform apply
data.google_compute_subnetwork.default: Reading...
data.google_compute_subnetwork.default: Read complete after 1s [id=projects/my-project/regions/us-east1/subnetworks/default]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_compute_instance.default will be created
  + resource "google_compute_instance" "default" {
      + can_ip_forward       = false
      + cpu_platform         = (known after apply)
      + current_status       = (known after apply)
      + deletion_protection  = false
      + effective_labels     = (known after apply)
      + guest_accelerator    = (known after apply)
      + id                   = (known after apply)
      + instance_id          = (known after apply)
      + label_fingerprint    = (known after apply)
      + machine_type         = "n2-standard-2"
      + metadata_fingerprint = (known after apply)
      + min_cpu_platform     = (known after apply)
      + name                 = "test"
      + project              = "my-project"
      + self_link            = (known after apply)
      + tags_fingerprint     = (known after apply)
      + terraform_labels     = (known after apply)
      + zone                 = "us-east1-b"

      + boot_disk {
          + auto_delete                = true
          + device_name                = (known after apply)
          + disk_encryption_key_sha256 = (known after apply)
          + kms_key_self_link          = (known after apply)
          + mode                       = "READ_WRITE"
          + source                     = (known after apply)

          + initialize_params {
              + image                  = "debian-cloud/debian-11"
              + labels                 = (known after apply)
              + provisioned_iops       = (known after apply)
              + provisioned_throughput = (known after apply)
              + size                   = (known after apply)
              + type                   = (known after apply)
            }
        }

      + confidential_instance_config (known after apply)

      + network_interface {
          + internal_ipv6_prefix_length = (known after apply)
          + ipv6_access_type            = (known after apply)
          + ipv6_address                = (known after apply)
          + name                        = (known after apply)
          + network                     = (known after apply)
          + network_ip                  = (known after apply)
          + stack_type                  = (known after apply)
          + subnetwork                  = "default"
          + subnetwork_project          = (known after apply)
        }

      + reservation_affinity (known after apply)

      + scheduling (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

google_compute_instance.default: Creating...
google_compute_instance.default: Still creating... [10s elapsed]
google_compute_instance.default: Still creating... [20s elapsed]
google_compute_instance.default: Creation complete after 22s [id=projects/my-project/zones/us-east1-b/instances/test]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
❯ terraform plan
data.google_compute_subnetwork.default: Reading...
data.google_compute_subnetwork.default: Read complete after 2s [id=projects/my-project/regions/us-east1/subnetworks/default]
google_compute_instance.default: Refreshing state... [id=projects/my-project/zones/us-east1-b/instances/test]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # google_compute_instance.default will be updated in-place
  ~ resource "google_compute_instance" "default" {
        id                   = "projects/my-project/zones/us-east1-b/instances/test"
        name                 = "test"
        tags                 = []
        # (22 unchanged attributes hidden)

      ~ network_interface {
            name                        = "nic0"
          ~ subnetwork                  = "https://www.googleapis.com/compute/v1/projects/my-project/regions/us-east1/subnetworks/default" -> "default"
            # (9 unchanged attributes hidden)
        }

        # (3 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you run "terraform apply" now.

No diff if I remove network_ip = "". And I've verified again that it doesn't happen with v5.42, only with v5.43 and also v6.0.

peikk0 commented 1 week ago

So it's likely related to https://github.com/hashicorp/terraform-provider-google/pull/19135

peikk0 commented 1 week ago

It works fine with setting it to null (and I'm fixing our modules to do that), it's only happening when set to an empty string.

karolgorc commented 1 week ago

Ok, i can reproduce the issue now. Will check out the conditions that make this happen.

Also tested out the quick fix I've provided earlier. Providing self_link does work if you just need to get rid of this diff for now image

Also the network_ip parameter is optional. You can just remove it and it will get rid of the diff as well image

peikk0 commented 1 week ago

Also the network_ip parameter is optional. You can just remove it and it will get rid of the diff as well

Yes, we actually have a conditional value on it, which is why it was set to "" in those cases:

  network_interface {
    subnetwork         = var.subnetwork_name != "" ? var.subnetwork_name : one(google_compute_subnetwork.subnetwork[*].name)
    subnetwork_project = var.project
    network_ip         = var.static_private_ip ? google_compute_address.static-ip-address[count.index].address : ""
  }
ggtisc commented 1 week ago
resource "google_compute_instance" "default" {
  name         = "test"
  machine_type = "n2-standard-2"
  zone         = var.zone
  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-11"
    }
  }
  network_interface {
    subnetwork = data.google_compute_subnetwork.default.name
    network_ip = ""
  }
}

Confirmed permadiff issue with the shared configuration

karolgorc commented 1 week ago

EDIT: found the issue in our code.

not really sure how to explain it but it works after some changes πŸ˜„

  1. The forceNewIfNetworkIPNotUpdatableFunc is executed as a part of CustomizeDiff
  2. network_ip recognizes a change for some reason when set to "" and is set as a ForceNew field
  3. But the API doesn't recognize the change and there aren't any parameters returned from the API that could cause the ForceNew
  4. This overrides all of the DiffSuppress functions and the other fields cause diff

Fixes to apply until the PR is merged

Will submit a PR with the fix in a few moments