hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.35k stars 1.75k forks source link

Unexpected empty policyTags added to BigQuery tables after import, causing unintended resource changes #19485

Closed kesompochy closed 2 weeks ago

kesompochy commented 2 months ago

Community Note

Terraform Version & Provider Version(s)

Terraform v1.9.5 on x86_64 GNU/Linux

Affected Resource(s)

google_bigquery_table

Terraform Configuration

resource "google_bigquery_table" "test_table" {
  dataset_id = google_bigquery_dataset.my_dataset.dataset_id
  deletion_protection = false
  table_id   = "test_table"
  project    = var.project_id

  schema = <<EOL
  [
    {
      "name": "time",
      "type": "TIMESTAMP",
      "mode": "NULLABLE"
    }
  ]
  EOL
}

import {
  to = google_bigquery_table.test_table
  id = "projects/my-project/datasets/my_dataset/tables/test_table"
}

Debug Output

Please see "Actual Behavior".

Expected Behavior

When importing a BigQuery table that was created using the bq command and then modifying an unrelated attribute (e.g., deletion_protection), the provider should not add empty policyTags to fields that didn't originally have them. The resource's eTag should remain unchanged if no actual modifications were made to the table's schema or structure.

 ---[ REQUEST ]---------------------------------------
PUT /bigquery/v2/projects/avid-influence-381703/datasets/my_dataset/tables/test_table?alt=json&prettyPrint=false HTTP/1.1
Host: bigquery.googleapis.com
User-Agent: google-api-go-client/0.5 Terraform/1.9.5 (+https://www.terraform.io) Terraform-Plugin-SDK/2.10.1 terraform-provider-google/4.80.0
Content-Length: 196
Content-Type: application/json
X-Goog-Api-Client: gl-go/1.19.9 gdcl/0.135.0
Accept-Encoding: gzip

{
  "schema": {
    "fields": [
       {
       "mode": "NULLABLE",
       "name": "time",
       "type": "TIMESTAMP"
        }
    ]
  },
  "tableReference": {
     "datasetId": "my_dataset",
     "projectId": "avid-influence-381703",
     "tableId": "test_table"
   }
}

Actual Behavior

After importing a BigQuery table and modifying an unrelated attribute, the provider sends a change request that includes policyTags: {} for fields, even when no policy tags were present initially. This causes the eTag of the resource to change unexpectedly.

 ---[ REQUEST ]---------------------------------------
PUT /bigquery/v2/projects/avid-influence-381703/datasets/my_dataset/tables/test_table?alt=json&prettyPrint=false HTTP/1.1
Host: bigquery.googleapis.com
User-Agent: google-api-go-client/0.5 Terraform/1.9.5 (+https://www.terraform.io) Terraform-Plugin-SDK/2.10.1 terraform-provider-google/4.80.0
Content-Length: 196
Content-Type: application/json
X-Goog-Api-Client: gl-go/1.19.9 gdcl/0.135.0
Accept-Encoding: gzip

{
  "schema": {
    "fields": [
      {
       "mode": "NULLABLE",
       "name": "time",
       "policyTags": {},
       "type": "TIMESTAMP"
      }
    ]
  },
  "tableReference": {
     "datasetId": "my_dataset",
     "projectId": "avid-influence-381703",
     "tableId": "test_table"
  }
}

Steps to reproduce

  1. Create a BigQuery table using the bq command
  2. Import the table into Terraform using terraform import
  3. Modify an unrelated attribute (e.g., deletion_protection)
  4. Run terraform apply
  5. Observe that the eTag has changed in the Terraform state

This behaviour causes unintended schema changes, including changes in the case sensitivity of field names, when using bq query --replace. This is likely because the bq command refers to the eTag to decide whether the schema should change or not.

$ bq query --replace --use_legacy_sql=false --destination_table=my_dataset.test_table 'SELECT time AS time FROM my_dataset.source_table'
Waiting on bqjob_r4c9ceead521c2858_00000191f780dca2_1 ... (2s) Current status: DONE
+---------------------+
|        time         |
+---------------------+
| 2024-09-16 06:57:34 |
+---------------------+
$ bq query --replace --use_legacy_sql=false --destination_table=my_dataset.test_table 'SELECT time AS TIME FROM my_dataset.source_table'
Waiting on bqjob_r3f95ae4e67f52d74_00000191f78135af_1 ... (1s) Current status: DONE
+---------------------+
|        time         |
+---------------------+
| 2024-09-16 06:57:34 |
+---------------------+
$ terraform apply
$ bq query --replace --use_legacy_sql=false --destination_table=my_dataset.test_table 'SELECT time AS TIME FROM my_dataset.source_table'
Waiting on bqjob_r4eca9f9d94559614_00000191f77f8929_1 ... (1s) Current status: DONE
+---------------------+
|        TIME         |
+---------------------+
| 2024-09-16 06:57:34 |
+---------------------+
$ terraform plan # shows force replace because of the schema change

Important Factoids

This behavior has been observed in several versions of v4.80.0 and above, including the latest version; it was not observed in v4.79.0 or below.

References

https://github.com/hashicorp/terraform-provider-google/pull/15547 https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/update

b/371990367

ggtisc commented 1 month ago

Hi @kesompochy!

I tried to replicate this issue with the following configuration similar to the one you are sharing, but the result was successful without errors. The etag didn't change and there aren't new policyTags:

  1. Initial configuration:
resource "google_bigquery_dataset" "bq_ds_19485" {
  dataset_id                  = "bigquery_dataset_19485"
  friendly_name               = "bigquery-dataset-19485"
  description                 = "something"
  location                    = "EU"
  default_table_expiration_ms = 3600000

  labels = {
    env = "default"
  }
}

resource "google_bigquery_table" "bq_table_19485" {
  dataset_id = google_bigquery_dataset.bq_ds_19485.dataset_id
  deletion_protection = false
  table_id   = "bq-table-19485"
  project    = "my-project"

  schema = <<EOL
  [
    {
      "name": "time",
      "type": "TIMESTAMP",
      "mode": "NULLABLE"
    }
  ]
  EOL
}
  1. I removed the google_bigquery_table from the tfstate file to import it with this new code:
resource "google_bigquery_dataset" "bq_ds_19485" {
  dataset_id                  = "bigquery_dataset_19485"
  friendly_name               = "bigquery-dataset-19485"
  description                 = "something"
  location                    = "EU"
  default_table_expiration_ms = 3600000

  labels = {
    env = "default"
  }
}

resource "google_bigquery_table" "bq_table_19485" {
  dataset_id = google_bigquery_dataset.bq_ds_19485.dataset_id
  deletion_protection = false
  table_id   = "bq-table-19485"
  project    = "my-project"

  schema = <<EOL
  [
    {
      "name": "time",
      "type": "TIMESTAMP",
      "mode": "NULLABLE"
    }
  ]
  EOL
}

import {
  to = google_bigquery_table.bq_table_19485
  id = "projects/my-project/datasets/bigquery_dataset_19485/tables/bq-table-19485"
}
  1. Finally I changed the value of the google_bigquery_table.deletion_protection from false to true
kesompochy commented 1 month ago

Hi @ggtisc!

I believe this issue is caused by the inclusion of "policyTags": {} when terraform apply is executed. The eTag change may not occur when applying a deletion_protection change to a resource imported from one created by terraform apply. To replicate the issue, could you please use a BigQuery table resource created by the bq command and import it into Terraform?

I'd like to note that this issue primarily affects resources created outside of Terraform. As such, it may not require immediate action. Thank you for your attention and response to this matter.

ggtisc commented 1 month ago

@kesompochy could you show us how are you attaching the existing google_bigquery_dataset in the bq command?

If it is possible share the full command you are using without sensitive information

kesompochy commented 1 month ago

Here is my configurations and commands. There is no error, but the eTag changes after following these steps.I believe this behavior is caused by the addition of empty policyTags in the API request.

  1. Initial terraform configuration:
provider "google" {
  project     = var.project_id
  region      = "us-central1"
}

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "4.80.0"
    }
  }
}

resource "google_bigquery_dataset" "my_dataset" {
  dataset_id = "my_dataset"
  project    = var.project_id
}
  1. Apply this configuration:
$ terraform apply
  1. Create a table using the bq command:
$ bq mk --table my-project:my_dataset.test_table time:TIMESTAMP
  1. Add an import block to the Terraform code:
resource "google_bigquery_table" "test_table" {
  dataset_id = google_bigquery_dataset.my_dataset.dataset_id
  deletion_protection = true
  table_id   = "test_table"
  expiration_time = "0"
  project    = var.project_id

  schema = <<EOL
  [
    {
      "name": "time",
      "type": "TIMESTAMP"
    }
  ]
  EOL
}

import {
  to = google_bigquery_table.test_table
  id = "projects/my-project/datasets/my_dataset/tables/test_table"
}
  1. Apply this configuration:
$ terraform apply -var-file="terraform.tfvars"
  1. Check the etag in the state:
$ terraform state show google_bigquery_table.test_table
  1. Modify an unrelated attribute (e.g., deletion_protection) in the Terraform code:
resource "google_bigquery_table" "test_table" {
  dataset_id = google_bigquery_dataset.my_dataset.dataset_id
  deletion_protection = true # modified
  table_id   = "test_table"
  expiration_time = "0"
  project    = var.project_id

  schema = <<EOL
  [
    {
      "name": "time",
      "type": "TIMESTAMP"
    }
  ]
  EOL
}
  1. Apply this configuration:
$ terraform apply -var-file="terraform.tfvars"
  1. Check the eTag, which has now changed:
$ terraform state show google_bigquery_table.test_table
ggtisc commented 1 month ago

Hi @kesompochy!

After replicating this step by step it is true that the etag changes in the tfstate file. but this is just a version control mechanism and it doesn't affect the resources. On the other hand there weren't added policy tags in any of these steps.

Terraform uses the etag to determine if a table has been modified outside of its control. If other resources depend on the google_bigquery_table, changes to the etag might trigger updates to those dependent resources as well. It is important to consider that while the etag is generally a reliable indicator of changes, it's not always foolproof. In some cases, minor updates or transient conditions might not result in an etag change.

In conclusion the fact that the etag changes is considered normal behavior since the deletion_protection argument was changed from true to false

kesompochy commented 1 month ago

Hi @ggtisc!

Thank you for replicating my steps. I agree with your conclusion. I understand that the etag change is uncontrollable from Terraform's perspective.

However, I'm still concerned about the behavior where unexpected empty policyTags are sent in the API request, even when the table schema hasn't changed after importing. I believe the change in policyTags doesn't appear in the Terraform state, but it does occur in the API request. The table's etag changes, likely because the addition of empty policyTags causes the remote resource to be treated as if it has been modified.

Could we improve the behavior so that empty policyTags are not added if policyTags are not explicitly set in the Terraform configuration? I'm aware of related PR https://github.com/hashicorp/terraform-provider-google/pull/15547 regarding the behavior of policy tags. I understand that any changes to policy tag behavior would need to be consistent with the context of these issues. The current behavior might be intentional, as discussed in the issue https://github.com/hashicorp/terraform-provider-google/issues/14047 .

Additionally, although this is not directly related to the policyTags issue, I was wondering if it's possible to avoid sending API requests when only the deletion_protection attribute is changed. I consider deletion_protection to be a feature managed solely by Terraform. If we could avoid sending API requests to Google Cloud for attributes that are managed exclusively by Terraform, I believe it could contribute to performance improvement.

ggtisc commented 1 month ago

After some tries I can't reproduce this issue. It looks like it is intermittent. I'm forwarding it for a deeper review

wj-chen commented 1 month ago

The current behavior might be intentional, as discussed in the issue https://github.com/hashicorp/terraform-provider-google/issues/14047 .

Yes, treating the absence of policyTags in the config file as "policyTags": {} in the API request is intentional to support unsetting policy tags for a table. The API treats the absence of policy tags in an API request as a no-op, so for a table that previous has policy tags attached, if the user wants to now clear them, they need to explicitly send "policyTags": {} in the API request. In Terraform, that may be done by omitting the policyTags attribute in the config file. This means for a table that never has policy tags specified in the schema, Terraform still sends "policyTags": {} to the API.

There may be some options to mitigate the side effect that causes the behavior you are reporting, but I'd like to understand more the impact of the changing table etag for your use case. Can you elaborate on how bq query --replace plays a role here?

If we could avoid sending API requests to Google Cloud for attributes that are managed exclusively by Terraform, I believe it could contribute to performance improvement.

I'm not aware of a mechanism that allows for this today, but the idea has merit and I'll bring it up for discussion.

kesompochy commented 1 month ago

Thank you for explaining the purpose and details regarding the empty policy tag behavior.

I'd like to share an unexpected scenario I encountered:

  1. In my BigQuery project, a daily bq query --replace job runs to generate data for several projects.
  2. I imported a table created by this bq query --replace workflow into Terraform.
  3. I modified the deletion_protection setting of the table in my Terraform configuration, which resulted in a change to the table's etag.
  4. The following day, the bq query --replace workflow altered the schema column names from lowercase to uppercase, as the bq query --replace command specified them in uppercase. I suspect that bq query --replace hadn't modified the schema before the table was imported into Terraform, likely because the etag matched the one from when the table was initially created. Please note that this is my hypothesis, as we cannot definitively determine the bq command's internal mechanism due to its closed-source nature.
  5. When I subsequently attempted to apply the Terraform configuration, Terraform planned to replace the entire table due to the schema change.

I understand that this is a highly specific scenario that likely affects very few users.

I'm not aware of a mechanism that allows for this today, but the idea has merit and I'll bring it up for discussion.

I appreciate your time and consideration. Please let me know if you think this issue is worth addressing, or if there's a better way I can contribute to improving this aspect of the provider. I understand that this may not be a high priority, but I believe fixing it could prevent unnecessary API calls and potential unexpected behaviors in similar scenarios.

wj-chen commented 1 month ago

The following day, the bq query --replace workflow altered the schema column names from lowercase to uppercase

Thanks for sharing the additional details. I'm still trying to understand how the diff detection finds a case mismatch in the column name. Is it that the table created using bq query --replace has upper case name in the schema (e.g. TIME as mentioned in the original post), but the Terraform config has lower case name in the schema (e.g. time as mentioned in the original post). If so, is there a reason why the two schema specs are inconsistent? What if you changed the Terraform config schema to also be upper case?

as we cannot definitively determine the bq command's internal mechanism due to its closed-source nature.

I happened to work on the bq commands too. As part of a query --replace command, there is no reading the eTag on the client side. The CLI simply gathers the input a parameters and makes a jobs.insert request with the corresponding parameters. I don't have knowledge on what happens on the backend when handling that request and whether eTag is involved there though.

I believe fixing it could prevent unnecessary API calls and potential unexpected behaviors in similar scenarios.

I just heard back and there does appear to exist a way to make this optimization today! It was added to the documentation as part of https://github.com/GoogleCloudPlatform/magic-modules/pull/11330, and I'll look into applying that to google_bigquery_table, which might solve your issue. It's generally a good-to-have anyways. Please continue to follow this issue for updates.

kesompochy commented 4 weeks ago

Thank you for your investigation, @wj-chen. It appears the bq query --replace command does not fix the schema case mismatch as described in https://github.com/hashicorp/terraform-provider-google/issues/19485#issue-2528412374. When a table is created by bq query --replace with a lowercase field, subsequent bq query --replace commands ignore the schema mismatch.

$ bq query --replace --use_legacy_sql=false --destination_table=my_dataset.test_table 'SELECT time AS time FROM my_dataset.source_table'
Waiting on bqjob_r4c9ceead521c2858_00000191f780dca2_1 ... (2s) Current status: DONE
+---------------------+
|        time         |
+---------------------+
| 2024-09-16 06:57:34 |
+---------------------+
$ bq query --replace --use_legacy_sql=false --destination_table=my_dataset.test_table 'SELECT time AS TIME FROM my_dataset.source_table'
Waiting on bqjob_r3f95ae4e67f52d74_00000191f78135af_1 ... (1s) Current status: DONE
+---------------------+
|        time         |
+---------------------+
| 2024-09-16 06:57:34 |
+---------------------+

However, after Terraform modifies the table configuration, bq query --replace then changes the field case.

$ terraform apply
$ bq query --replace --use_legacy_sql=false --destination_table=my_dataset.test_table 'SELECT time AS TIME FROM my_dataset.source_table'
Waiting on bqjob_r4eca9f9d94559614_00000191f77f8929_1 ... (1s) Current status: DONE
+---------------------+
|        TIME         |
+---------------------+
| 2024-09-16 06:57:34 |
+---------------------+

This suggests that the issue is not related to the Terraform config schema case itself.

I'll look into applying that to google_bigquery_table, which might solve your issue.

That's excellent news! Thank you very much for your efforts on this matter.

wj-chen commented 2 weeks ago

Updating deletion_protection should no longer trigger an Update API request once https://github.com/GoogleCloudPlatform/magic-modules/pull/12174 is released in the provider v6.11.0 (in ~2w). If you continue to have problems, please reopen this or create a new issue.

kesompochy commented 2 weeks ago

Thanks @wj-chen for working on this! Really looking forward to using v6.11.0 with the improved functionality!