hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.32k stars 1.73k forks source link

google_bigquery_dataset access evaluates with case #11461

Open BigBearZab opened 2 years ago

BigBearZab commented 2 years ago

Community Note

Terraform Version

Provider: google 4.4.0

Affected Resource(s)

Terraform Configuration Files

Access is defined in datasets using a for_each block and then passing in variables (this is a template file)

resource "google_bigquery_dataset" "default" {
  # next 3 lines will be used to set the dataset info
  dataset_id  = var.dataset_id
  description = var.description
  location    = "EU"

  # access controls are defined in very broad strokes below. These will then be called as variables by the final user to create the dataset
  # first define dataset owners
  dynamic "access" {
    for_each = var.owner_user_by_email
    content {
      user_by_email = access.value
      role          = "OWNER"
    }
  }
  # define editors of the dataset
  dynamic "access" {
    for_each = var.writer_user_by_email
    content {
      user_by_email = access.value
      role          = "WRITER"
    }
  }
  # define individual readers
  dynamic "access" {
    for_each = var.reader_user_by_email
    content {
      user_by_email = access.value
      role          = "READER"
    }
  }
  # define group readers
  dynamic "access" {
    for_each = var.reader_group_by_email
    content {
      group_by_email = access.value
      role          = "READER"
    }
  }
  # define authorised views, views will need to be written as "project.dataset.view"
  dynamic "access" {
    for_each = var.authorised_view
    content {
      view {
        project_id = split(".", access.value)[0]
        dataset_id = split(".", access.value)[1]
        table_id   = split(".", access.value)[2]
      }
    }
  }
}

Debug Output

I don't think this is required

Panic Output

N/A

Expected Behavior

There are 2 issues:

1) if only one email is added, only this is shown as a diff in the plan 2) case is ignored in the email comparison of config to actual

Actual Behavior

1) All access is shown as being removed and re-applied which with large amounts of access is very hard to read/check plus the actual change made. 2) If an email is stored on the google end as John.Smith@company.com (I don't understand why google allows case sensitive emails in IAMs) but the config says john.smith@company.com every subsequent plan will show :

+ john.smith@company.com
- John.Smith@company.com

Comparison should not be case sensitive

Steps to Reproduce

1) Create a dataset with multiple accessing users using a module pointing to above template via source 2) Run an apply 3) Raise a PR changing one of the existing accessors or adding a new one and run a tf plan

Important Factoids

References

b/301412467

nolan-jardine commented 1 year ago
  1. if only one email is added, only this is shown as a diff in the plan

This is a really big issue for me. There is currently no solution that I know of that provides an authoritative way to manage BigQuery dataset access that supports authorized datasets/views and does not generate horrendous plans when a single change to access is made. There are currently 5 options to manage BigQuery dataset access and they all have glaring issues:

  1. google_bigquery_dataset - when a single change is made to access, all access blocks are dropped and re-created. This generates unmanageable plans that are unreasonable to check for plan differences when there are many access blocks. This is very important when managing access, especially when it's dynamic.
  2. google_bigquery_dataset_access - non-authoritative
  3. google_bigquery_dataset_iam_policy - does not support authorized datasets/views
  4. google_bigquery_dataset_iam_binding - does not support authorized datasets/views
  5. google_bigquery_dataset_iam_member - non-authoritative, does not support authorized datasets/views

I would love it if google_bigquery_dataset did not generate the unmanageable plans, as that seems to be the simplest fix and I like everything else about the resource. However, I would be very open to any solution that provides a solution to the problem (i.e. an authoritative way to manage BigQuery dataset access that supports authorized datasets/views and does not generate horrendous plans when a single change to access is made).

mwstanleyft commented 8 months ago

I agree that all 5 options are not really suitable for the reasons outlined.

There is one potential workaround, which is to use project-level bindings, which are authoritative, and attach conditions to them so that access is only available through certain datasets. An example would look like

resource "google_project_iam_binding" "staging-dataset-access" {
  project = "my-gcp-project"
  role    = "roles/bigquery.dataViewer"
  members = ["user:my_account@example.com"]

  condition {
    title       = "staging_only"
    description = "Grants access to the dataset Staging"
    expression  = "resource.type == \"bigquery.googleapis.com/Dataset\" && resource.name == \"projects/my-gcp-project/datasets/staging\""
  }
}

But this is in pre-GA and has a variety of caveats and limitations as discussed in the docs for conditional access.

rileykarson commented 6 months ago

Related-ish: https://github.com/hashicorp/terraform-provider-google/issues/16607

Internal related: b/296451143