hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.36k stars 1.75k forks source link

Breaking change in 6.9.0 due to the introduction of google_compute_instance key_revocation_action_type #20081

Open hfsaito opened 3 weeks ago

hfsaito commented 3 weeks ago

Community Note

Terraform Version & Provider Version(s)

Terraform vX.X.X

Terraform vX.X.X

Affected Resource(s)

After this change https://github.com/hashicorp/terraform-provider-google/pull/19952 Existing google_compute_instance were forced to replace

  # google_compute_instance.sample must be replaced
-/+ resource "google_compute_instance" "sample" {
      ...
      - key_revocation_action_type = "NONE" -> null # forces replacement

Terraform Configuration

Create a google_compute_instance with Google Provider version 6.8.0 Run planning without any change other than the Google Provider version to 6.9.0 You'll see the planning recreating existing google_compute_instance

Debug Output

No response

Expected Behavior

Only touch/break existing resources between major version numbers

Actual Behavior

No response

Steps to reproduce

No response

Important Factoids

No response

References

No response

b/376700004

karolgorc commented 3 weeks ago

Couldn't recreate this. For me the upgrade goes normally.

Can you share your config? The field doesn't have a default value, nor is it computed from the API, so it shouldn't set itself, Unless it's computed only in certain scenarios and the API set the field for you. Still cannot hit this issue

BuckinghamIO commented 3 weeks ago

Just hit this issue on our setup also, forcing a replacement of a VM which can't be done during business hours.

BuckinghamIO commented 3 weeks ago

@hfsaito

I have set key_revocation_action_type = "NONE"

Manually and that seem to have prevented the replacement however it does seem the default isn't working properly.

According to the TF docs it should be NONE by default but clearly not.

nightpool commented 3 weeks ago

This is the second time in as many months that terraform-provider-google has had a major breaking change like this in a minor version. Please avoid taking down people's infra!

slevenick commented 3 weeks ago

The documentation seems wrong here, we don't set any default value on the Terraform side. It may be trying to indicate that the default value that the API uses if the value is unset is "NONE", but I can't verify that.

I'm not sure there is a fix to make here other than clarifying the documentation. If the field was configured outside of Terraform in the past, it is correct for Terraform to see the diff and attempt to reconcile that.

It doesn't appear that the value of "NONE" gets set by the API by default, at least in my attempt to recreate it.

karolgorc commented 3 weeks ago

Waiting on confirmation from upstream whether this is an API bug that only some users may encounter.

slevenick commented 3 weeks ago

Sounds good. I'll move this back to bug until we get confirmation that this happens to users who have not manually configured the setting outside of Terraform

hfsaito commented 3 weeks ago
  1. I didn't manually change any configuration
    • I don't think it's possible to change that manually without recreating the VM
    • If I had done this, I would see this change in a previous Terraform plan updating the resource id
  2. If this configuration it's possible to be changed without replacement, then why it's forcing replacement in this provider?
  3. Why a new attribute that can lead to replacements was added in a minor version number change instead of a major version number change?
slevenick commented 3 weeks ago
  1. I didn't manually change any configuration

    • I don't think it's possible to change that manually without recreating the VM
    • If I had done this, I would see this change in a previous Terraform plan updating the resource id
  2. If this configuration it's possible to be changed without replacement, then why it's forcing replacement in this provider?
  3. Why a new attribute that can lead to replacements was added in a minor version number change instead of a major version number change?

I'm going to try to get the service team involved to see how or why the value of NONE would be set. If it's indeed set without user input in some cases, we should fix this.

Why a new attribute that can lead to replacements was added in a minor version number change instead of a major version number change?

This is unavoidable as new fields are added in Terraform. We add support for new fields all the time and in some cases those fields are not updatable in-place. If the new field has an API-set default we should take that into account and find a way to implement the field where it does not cause a recreation, but in the case where a user has manually configured a setting outside of Terraform the recreation in the plan is correct behavior.

Something that we've seen in the past is that there are cases where a new field is added in the API and a default value is backfilled for all existing resources but all new resources created after that date don't get the default. This can hide the server-side default behavior when we implement the feature as we can't determine the behavior for older resources with the backfilled value. This is what I'm worried happened here.

slevenick commented 3 weeks ago

Talked with the service team. They think that the only way to get the value of NONE is for the client to send that value at some point.

If anyone has reproduction steps that I can use that would be helpful, but until then I don't know if there is much we can do here.

pablo-suazo commented 2 weeks ago

We are facing the same issue. We have not been modified that value, and some VM's have the "NONE" value and other have the value nullified.

If we try to establish the default NONE value to all the VM?s, terraform plan is showing it will recreate some of them.

If we assign the null value, terraform plan is showing that will recreate some VM's.

Any workaround rather to be checking which VM's have a null value or a NONE value?

We have to keep using Google provider 6.8.0 for now because of this.

slevenick commented 2 weeks ago

As a workaround you can specify the ignore_changes block temporarily.

https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle#ignore_changes