hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.76k stars 9.56k forks source link

CLI Plan Output Not Showing null to "" Differences for Non-Legacy Type System Providers #31887

Open bflad opened 2 years ago

bflad commented 2 years ago

Terraform Version

Terraform v1.2.9
on linux_amd64

(although this has probably been happening for a very long time to hide terraform-plugin-sdk legacy behaviors)

Terraform Configuration Files

terraform {
  required_providers {
    random = {
      source  = "hashicorp/random"
      version = "3.4.2"
      # version = "3.3.2"
    }
  }
}

resource "random_password" "test" {
  length = 20
}

Debug Output

Can provide if needed.

Expected Behavior

Either no plan difference output or a plan difference output including every attribute change triggering the difference for providers not using the legacy type system flag in the protocol.

Actual Behavior

$ terraform apply 
random_password.test: Refreshing state... [id=none]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # random_password.test will be updated in-place
  ~ resource "random_password" "test" {
        id          = "none"
        # (12 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Steps to Reproduce

  1. terraform init with version 3.3.2
  2. terraform apply
  3. terraform init -upgrade with version 3.4.2
  4. terraform plan

The issue references also contain a few other reproductions of similar behaviors.

Additional Context

Terraform Cloud UI seems like it was actually able to pick up the "addition" of the attribute being set to "" from null, so I suspect this is just a CLI plan output issue.

We understand why the provider bug occurred, it was an issue on our side of the protocol in the provide code. Inspecting the differences in state after applying the "unexpected" plan showed the change:

4c4
<   "serial": 1,
---
>   "serial": 3,
28c28
<             "override_special": null,
---
>             "override_special": "",
33,34c33
<           "sensitive_attributes": [],
<           "private": "eyJzY2hlbWFfdmVyc2lvbiI6IjIifQ=="
---
>           "sensitive_attributes": []

Aside: terraform-plugin-sdk also would do some potentially unexpected things with what it decided to store in state (empty vs null), especially with imported resources and with maps that have null values, so there's other similar reports we've been working through with migrating providers from that to terraform-plugin-framework. This isn't core's fault, however I'm saying this out loud in case there's other bug reports that wind up in this issue tracker about "strange" plans as folks migrate.

References

jbardin commented 2 years ago

Thanks @bflad!

Do you have any examples of non-legacy providers with issues? The protocol shims were not sufficient to hide the legacy SDK behavior entirely, so IIRC there are some heuristics in the diff rendering to ignore these minor differences too. There might be some way to plumb the "legacy" flag through to the final rendering, but maybe some examples would show an easier workaround?

bflad commented 2 years ago

Hi @jbardin 👋 Could you redescribe what you are looking for? Do you mean start to finish reproduction with the protocol flag never involved? I'm not sure if I have one of those readily available. I was raising this to prevent confusion with migrations and slight differences that might arise.

The reproduction here and some of the similar references are for the random provider, which was migrated off the legacy type system flag in version 3.4.0.

bflad commented 2 years ago

From an out-of-band conversion, one thing I didn't say out loud here is the schema changes that occurred here:

  1. It was originally Optional only in the sdk
  2. It was errantly introduced in the framework version as Optional+Computed+Default plan modifier of “” (timeframe of this bug)
  3. It was fixed back to Optional only afterwards.

So the missing plan difference was specifically a null configuration and Computed difference.

jbardin commented 2 years ago

Thanks, that makes sense now why it was not reported as an error during the plan. The existing plan renderer already needs some extensive work, so we can probably plan to consider this type of change when we get there. Since the CLI will have to deal with legacy providers for quite a while, defaulting to hiding these minor differences may take precedence, but since this shouldn't be a common occurrence, perhaps an option like #27547 to show the plan in complete detail would suffice.

apparentlymart commented 2 years ago

In case it's useful context for someone who works on this in future, I believe this is the current location of the legacy SDK workaround that the view layer implements (as opposed to the one implemented in Terraform Core itself):

https://github.com/hashicorp/terraform/blob/621af43c044b1082cc4ff39fa4080de95cb6355d/internal/command/views/plan.go#L550-L559

The comment here suggests that we did at one point imagine there being a way to handle this in Terraform Core (where it would have access to the "legacy type system" flag) but did this simpler solution to reduce the already-high risk for the v0.12 release. I don't actually remember what that other way was though, so I think we'll have to design it again from scratch if we want to address it now.

apparentlymart commented 2 years ago

Interestingly, the current implementation of NormalizeObjectFromLegacySDK seems to focus only on normalizing the representation of nested blocks, which I don't think explains why override_special (which is an attribute of type string) would be subject to this normalization. I guess that means there's some other normalization going on somewhere else.

bflad commented 2 years ago

I'm not sure if it matters much, but https://github.com/hashicorp/terraform-provider-tls/issues/284 is an example of the opposite issue where "" to null is missing from CLI terminal plan output for a provider without the legacy type system flag:

terraform {
  required_providers {
    tls = {
      source  = "hashicorp/tls"
      version = "3.4.0"
      # version = "4.0.3" # upgrade after initial apply to show missing plan output
    }
  }
}

resource "tls_private_key" "example" {
  algorithm = "ECDSA"
}

resource "tls_self_signed_cert" "example" {
  private_key_pem   = tls_private_key.example.private_key_pem
  is_ca_certificate = true

  subject {
    organization = "Example"
  }

  # 3 Years
  validity_period_hours = 24 * 365 * 3

  allowed_uses = [
    "cert_signing",
    "crl_signing",
  ]
}
$ terraform init
$ terraform apply
$ terraform init -upgrade
$ terraform apply
...
      ~ subject { # forces replacement
            # (1 unchanged attribute hidden)
        }
59,61c59,61
<                 "common_name": "",
<                 "country": "",
<                 "locality": "",
---
>                 "common_name": null,
>                 "country": null,
>                 "locality": null,
63,66c63,66
<                 "organizational_unit": "",
<                 "postal_code": "",
<                 "province": "",
<                 "serial_number": "",
---
>                 "organizational_unit": null,
>                 "postal_code": null,
>                 "province": null,
>                 "serial_number": null,

These attributes are Optional: true only within a block. Please let me know if you'd prefer a separate issue for tracking.

bflad commented 1 year ago

It looks like this is causing confusion when these "hidden" value differences are underneath collection types as well: https://discuss.hashicorp.com/t/plugin-framework-state-upgraders/47699

jbardin commented 1 year ago

For reference, the new renderer inherited the old behavior which is now implemented here.

jbardin commented 1 year ago

@bflad, do you have any example of particularly noisy resource configurations which we could use to evaluate how much output this still suppresses? Now that providers are migrating to the framework where these can be meaningful diffs, I wonder if it's time to just accept the inconvenience that SDK resources might occasionally show larger than expected diff output.

bflad commented 8 months ago

I'm not sure about "noisy" resource configurations, but this is trivial to reproduce via the built-in terraform_data managed resource. I'm guessing we would want this behavior conditionalized on the legacy type system flag, if possible, to limit the plan renderer causing much more noise with the legacy SDK.

terraform {
  required_version = "~> 1.7"
}

resource "terraform_data" "test" {
  # apply like this, then comment or set to null
  input = ""
}

e.g.

$ terraform apply -auto-approve

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
  + create

Terraform will perform the following actions:

  # terraform_data.test will be created
  + resource "terraform_data" "test" {
      + id     = (known after apply)
      + output = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.
terraform_data.test: Creating...
terraform_data.test: Creation complete after 0s [id=f95b13bf-5a6b-f206-3604-c97a8f4c7c5c]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

### comment/nullify input

$ terraform plan               
terraform_data.test: Refreshing state... [id=f95b13bf-5a6b-f206-3604-c97a8f4c7c5c]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # terraform_data.test will be updated in-place
  ~ resource "terraform_data" "test" {
        id     = "f95b13bf-5a6b-f206-3604-c97a8f4c7c5c"
      + output = (known after apply)
    }

Plan: 0 to add, 1 to change, 0 to destroy.
akinross commented 8 months ago

Hi @jbardin,

Currently with the provider we are developing with plugin framework explicit empty strings are not displayed correctly in plan, see https://github.com/hashicorp/terraform-plugin-framework/issues/921 for the configuration and plan output details. From plugin framework side the issue has been closed and not planned, but I was forwarded upstream to this issue.

From @bflad I understood that this is something this code base sees as non-trivial and is not something that will be pursued, what is the reasoning behind this? Reason I am asking is that to me it seems that explicit empty string in plan output is not working as it should.

Would it be possible to reconsider making changes to plan renderer that would fix this specific case?

jbardin commented 8 months ago

Hi @akinross, Unfortunately there is no absolute way to differentiate between changes which were intentional, and changes which were because of a misbehaving provider. Completely removing this bit of shim code for legacy provider compatibility would be nice, but it's not something we can do until most of those resources are updated from the SDK to fix the behavior.

In the meantime I have a minor proposal which can detect some cases where the output can be displayed, and we can evaluate if that complexity is worth adding for the partial solution it offers.

akinross commented 8 months ago

Hi @jbardin, Thank you for your response. I understand that you do not want to introduce breaking change regarding legacy providers. Completely removing would impact our provider also, since we are still in a muxed state and working on migration of all the resources. Would it be a possibility to detect in the rendering the type of provider "legacy SDK" vs "plugin framework", to differentiate to allow both to render properly?

jbardin commented 8 months ago

@akinross, the changes are not recorded with any indication of which SDK the provider was using (both internally, and in the serialized plan), so the renderer has no way to differentiate the changes.