Terraform 0.12 Set diff shows every field being recreated

hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

https://www.terraform.io

Other

42.76k stars 9.56k forks source link

Terraform 0.12 Set diff shows every field being recreated #21901

Open rileykarson opened 5 years ago

rileykarson commented 5 years ago

Terraform Version

$ TF_LOG= terraform -v
Terraform v0.12.3
+ provider.google (unversioned)

Terraform Configuration Files

resource "google_bigquery_dataset" "temp" {
  dataset_id = "temp"

  default_table_expiration_ms = "2419200000"
  access {
    role          = "OWNER"
    user_by_email = "tf-153@graphite-test-rileykarson.iam.gserviceaccount.com"
  }

  access {
    user_by_email = "tf-test-2174102886934250970@graphite-test-rileykarson.iam.gserviceaccount.com"
    role           = "READER"
  }
}

resource "google_bigquery_dataset" "temp" {
  dataset_id = "temp"

  default_table_expiration_ms = "2419200000"
  access {
    role          = "OWNER"
    user_by_email = "tf-153@graphite-test-rileykarson.iam.gserviceaccount.com"
  }

  access {
    user_by_email = "tf-test-2174102886934250970@graphite-test-rileykarson.iam.gserviceaccount.com"
    role           = "OWNER"
  }
}

Debug Output

https://gist.github.com/rileykarson/830f562dc9c735f4e491777eea4c4a02

Expected Behavior

      + access {
          + role          = "OWNER"
          + user_by_email = "tf-test-2174102886934250970@graphite-test-rileykarson.iam.gserviceaccount.com"
        }
      - access {
          - role          = "READER" -> null
          - user_by_email = "tf-test-2174102886934250970@graphite-test-rileykarson.iam.gserviceaccount.com" -> null
        }

Actual Behavior

      - access {
          - role          = "OWNER" -> null
          - user_by_email = "tf-153@graphite-test-rileykarson.iam.gserviceaccount.com" -> null
        }
      + access {
          + role          = "OWNER"
          + user_by_email = "tf-test-2174102886934250970@graphite-test-rileykarson.iam.gserviceaccount.com"
        }
      - access {
          - role          = "READER" -> null
          - user_by_email = "tf-test-2174102886934250970@graphite-test-rileykarson.iam.gserviceaccount.com" -> null
        }
      + access {
          + role          = "OWNER"
          + user_by_email = "tf-153@graphite-test-rileykarson.iam.gserviceaccount.com"
        }

Steps to Reproduce

terraform apply

Additional References

See https://github.com/terraform-providers/terraform-provider-google/issues/3929

jbardin commented 5 years ago

Hi @rileykarson,

Looking at the schema for that resource, access is defined as a "set", which is an unordered data structure. If the existing and planned set sets contain identical values, then the plan output will show only the minimal changes. I'm guessing that there's some small difference in the internal structure of the first OWNER value (probably a zero vs null difference) which causes it to show in the diff because the values cannot be determined to be equal.

The referenced issue mentions that the order matters. If that's the case, then the set data type isn't appropriate here and we should migrate access to a list. If the order doesn't matter, can you verify if this has any impact on the resource update itself?

Thanks!

rileykarson commented 5 years ago

access being TypeSet is correct. I don't think this is a problem with zero vs null values. If I apply the same config twice, there's no diff, indicating that the underlying hash value hasn't changed. It appears set hash values are hidden even in debug logs, so I can't confirm that through any other means.

Ordering of elements has no effect. If I replace my second config with the following:

resource "google_bigquery_dataset" "temp" {
  dataset_id = "temp"

  default_table_expiration_ms = "2419200000"

  access {
    user_by_email = "tf-test-2174102886934250970@graphite-test-rileykarson.iam.gserviceaccount.com"
    role           = "OWNER"
  }

  access {
    role          = "OWNER"
    user_by_email = "tf-153@graphite-test-rileykarson.iam.gserviceaccount.com"
  }
}

I get this plan:

      - access {
          - role          = "OWNER" -> null
          - user_by_email = "tf-153@graphite-test-rileykarson.iam.gserviceaccount.com" -> null
        }
      + access {
          + role          = "OWNER"
          + user_by_email = "tf-test-2174102886934250970@graphite-test-rileykarson.iam.gserviceaccount.com"
        }
      - access {
          - role          = "READER" -> null
          - user_by_email = "tf-test-2174102886934250970@graphite-test-rileykarson.iam.gserviceaccount.com" -> null
        }
      + access {
          + role          = "OWNER"
          + user_by_email = "tf-153@graphite-test-rileykarson.iam.gserviceaccount.com"
        }

jbardin commented 5 years ago

Thanks for confirming @rileykarson.

The provider shims do some normalization so that the small differences are ironed out when there is no overall change to the resource, but set values can't always be normalized in such a manner, because modifying the set value changes its identity. This is likely the reason (and really the only explanation) this part of the plan output:

          - role          = "OWNER" -> null
          - user_by_email = "tf-153@graphite-test-rileykarson.iam.gserviceaccount.com" -> null
        }
      + access {
          + role          = "OWNER"
          + user_by_email = "tf-153@graphite-test-rileykarson.iam.gserviceaccount.com"
        }

Chances are that this is from the nested view list block, though there's not much that can be done to improve it with the current SDK. That in itself is harmless, as the values should ends up identical once they reaches the provider. This is usually only a UX problem, since it makes the diffs of sets much harder to read. We may be able to "relax" the diff output a bit more to not show visually identical values in the plan output.

rileykarson commented 5 years ago

I was curious about

Chances are that this is from the nested view list block, though there's not much that can be done to improve it with the current SDK.

Since the Google provider generally stores lists in state as the empty list when nil instead of not storing them at all, I tried manually editing state and running terraform plan -refresh=false, it appears that these produce identical plans:

Original:

            "access": [
              {
                "domain": "",
                "group_by_email": "",
                "role": "OWNER",
                "special_group": "",
                "user_by_email": "tf-153@graphite-test-rileykarson.iam.gserviceaccount.com",
                "view": []
              },
              {
                "domain": "",
                "group_by_email": "",
                "role": "READER",
                "special_group": "",
                "user_by_email": "tf-test-2174102886934250970@graphite-test-rileykarson.iam.gserviceaccount.com",
                "view": []
              }
            ],

Modified:

            "access": [
              {
                "domain": "",
                "group_by_email": "",
                "role": "OWNER",
                "special_group": "",
                "user_by_email": "tf-153@graphite-test-rileykarson.iam.gserviceaccount.com"
              },
              {
                "domain": "",
                "group_by_email": "",
                "role": "READER",
                "special_group": "",
                "user_by_email": "tf-test-2174102886934250970@graphite-test-rileykarson.iam.gserviceaccount.com"
              }
            ],

jbardin commented 5 years ago

Yup, it's not the view block (though I'm not sure editing the state would show the diff, since blocks are not allowed to be null and it may be reified by when creating the proposed change). It's the fact that the legacy sdk doesn't reliably differentiate between an unset string and an empty string. You can see in the state the resource returned empty strings for for some of the fields (probably not the fault of the resource, just a limitation of the old SDK).

Internally the state looks like

cty.ObjectVal(map[string]cty.Value{
  "domain":cty.StringVal(""),
  "group_by_email":cty.StringVal(""),
  "role":cty.StringVal("OWNER"),
  "special_group":cty.StringVal(""),
  "user_by_email":cty.StringVal("tf-153@graphite-test-rileykarson.iam.gserviceaccount.com"),
  "view":cty.ListValEmpty(cty.Object(map[string]cty.Type{"dataset_id":cty.String, "project_id":cty.String, "table_id":cty.String}))
})

while the proposed value from the config looks like:

cty.ObjectVal(map[string]cty.Value{
  "domain":cty.NullVal(cty.String),
  "group_by_email":cty.NullVal(cty.String),
  "role":cty.StringVal("OWNER"),
  "special_group":cty.NullVal(cty.String),
  "user_by_email":cty.StringVal("tf-153@graphite-test-rileykarson.iam.gserviceaccount.com"),
  "view":cty.ListValEmpty(cty.Object(map[string]cty.Type{"dataset_id":cty.String, "project_id":cty.String, "table_id":cty.String}))
})

dharamsk commented 4 years ago

@jbardin @rileykarson any proposed solution here? I'm not sure what fixing "the legacy SDK" might involve.

This is a primary blocker for us migrating off of TF 0.11. We had a workaround for this with 0.11 by tweaking landscape to treat all sets as unordered. Landscape doesn't support 0.12 and they have no plans to. Our diffs are too large to read due to the length of some of the lists in our resources that are modified frequently.

dharamsk commented 4 years ago

Could a workaround be to modify the provider or terraform CLI to write empty strings as null values?

huydinhle commented 4 years ago

Is this a problem with legacy SDK? If so, we are using new terraform ( v0.12.24) and later google-provider(v3.19.0) and still facing the exact problem. Is there any workaround currently? Our access list can be quite long and it is impossible to read the plan output for our Bigquery dataset.

dharamsk commented 4 years ago

I resorted to hacky scripting and my solution makes me so so sad, but allowed us to be functional with 0.12: https://gist.github.com/dharamsk/19cefd0438d331a26accd1d4795349d7

I really hope a path forward for a fix here can be identified. I still don't understand where (if possible) I should make a PR to have the SDK line up with the state file when it comes to blank strings vs nulls.

wj-chen commented 1 year ago

I'm from the BQ TF team and found this issue as it was recently mentioned again for google_bigquery_dataset. At a glance this seems to be best with a SDK-level fix for schema.TypeSet? What's the current plan?

jbardin commented 1 year ago

Hi @wj-chen, the legacy SDK is essentially frozen and cannot be changed without breaking legacy providers. The Terraform Plugin Framework allows full access to the modern Terraform type system and can handle these types of changes between null and zero values.

wj-chen commented 1 year ago

Thank you @jbardin. @rileykarson - what would be the next step for our providers that are impacted by this?

rileykarson commented 1 year ago

We can cover this in a Google provider-specific issue / internal followup! Feel free to open a new issue if the one you got here from has been closed.

Adopting the plugin framework (https://github.com/hashicorp/terraform-plugin-framework) over the SDK (https://github.com/hashicorp/terraform-plugin-sdk) is the answer, but it'll be a long/difficult migration for the provider as a whole. We're driving that on the core Google provider team.

dwilliams782 commented 7 months ago

Yup, it's not the view block (though I'm not sure editing the state would show the diff, since blocks are not allowed to be null and it may be reified by when creating the proposed change). It's the fact that the legacy sdk doesn't reliably differentiate between an unset string and an empty string. You can see in the state the resource returned empty strings for for some of the fields (probably not the fault of the resource, just a limitation of the old SDK).

Internally the state looks like
cty.ObjectVal(map[string]cty.Value{
  "domain":cty.StringVal(""),
  "group_by_email":cty.StringVal(""),
  "role":cty.StringVal("OWNER"),
  "special_group":cty.StringVal(""),
  "user_by_email":cty.StringVal("tf-153@graphite-test-rileykarson.iam.gserviceaccount.com"),
  "view":cty.ListValEmpty(cty.Object(map[string]cty.Type{"dataset_id":cty.String, "project_id":cty.String, "table_id":cty.String}))
})
while the proposed value from the config looks like:
cty.ObjectVal(map[string]cty.Value{
  "domain":cty.NullVal(cty.String),
  "group_by_email":cty.NullVal(cty.String),
  "role":cty.StringVal("OWNER"),
  "special_group":cty.NullVal(cty.String),
  "user_by_email":cty.StringVal("tf-153@graphite-test-rileykarson.iam.gserviceaccount.com"),
  "view":cty.ListValEmpty(cty.Object(map[string]cty.Type{"dataset_id":cty.String, "project_id":cty.String, "table_id":cty.String}))
})

For anyone that discovers this down the line, the latest provider introduces iam_member field which also needs to be fed an empty string to mute the access block diff:

  access {
    role           = "OWNER"
    special_group  = "projectOwners"
    domain         = ""
    group_by_email = ""
    user_by_email  = ""
    iam_member     = ""
  }