google_bigquery_dataset access evaluates with case

BigBearZab commented 2 years ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Provider: google 4.4.0

Affected Resource(s)

google_bigquery_dataset - specifically access block

Terraform Configuration Files

Access is defined in datasets using a for_each block and then passing in variables (this is a template file)

resource "google_bigquery_dataset" "default" {
  # next 3 lines will be used to set the dataset info
  dataset_id  = var.dataset_id
  description = var.description
  location    = "EU"

  # access controls are defined in very broad strokes below. These will then be called as variables by the final user to create the dataset
  # first define dataset owners
  dynamic "access" {
    for_each = var.owner_user_by_email
    content {
      user_by_email = access.value
      role          = "OWNER"
    }
  }
  # define editors of the dataset
  dynamic "access" {
    for_each = var.writer_user_by_email
    content {
      user_by_email = access.value
      role          = "WRITER"
    }
  }
  # define individual readers
  dynamic "access" {
    for_each = var.reader_user_by_email
    content {
      user_by_email = access.value
      role          = "READER"
    }
  }
  # define group readers
  dynamic "access" {
    for_each = var.reader_group_by_email
    content {
      group_by_email = access.value
      role          = "READER"
    }
  }
  # define authorised views, views will need to be written as "project.dataset.view"
  dynamic "access" {
    for_each = var.authorised_view
    content {
      view {
        project_id = split(".", access.value)[0]
        dataset_id = split(".", access.value)[1]
        table_id   = split(".", access.value)[2]
      }
    }
  }
}

Debug Output

I don't think this is required

Panic Output

N/A

Expected Behavior

There are 2 issues:

1) if only one email is added, only this is shown as a diff in the plan 2) case is ignored in the email comparison of config to actual

Actual Behavior

1) All access is shown as being removed and re-applied which with large amounts of access is very hard to read/check plus the actual change made. 2) If an email is stored on the google end as John.Smith@company.com (I don't understand why google allows case sensitive emails in IAMs) but the config says john.smith@company.com every subsequent plan will show :

+ john.smith@company.com
- John.Smith@company.com

Comparison should not be case sensitive

Steps to Reproduce

1) Create a dataset with multiple accessing users using a module pointing to above template via source 2) Run an apply 3) Raise a PR changing one of the existing accessors or adding a new one and run a tf plan

Important Factoids

References

0000

b/301412467

nolan-jardine commented 1 year ago

if only one email is added, only this is shown as a diff in the plan

This is a really big issue for me. There is currently no solution that I know of that provides an authoritative way to manage BigQuery dataset access that supports authorized datasets/views and does not generate horrendous plans when a single change to access is made. There are currently 5 options to manage BigQuery dataset access and they all have glaring issues:

google_bigquery_dataset - when a single change is made to access, all access blocks are dropped and re-created. This generates unmanageable plans that are unreasonable to check for plan differences when there are many access blocks. This is very important when managing access, especially when it's dynamic.
google_bigquery_dataset_access - non-authoritative
google_bigquery_dataset_iam_policy - does not support authorized datasets/views
google_bigquery_dataset_iam_binding - does not support authorized datasets/views
google_bigquery_dataset_iam_member - non-authoritative, does not support authorized datasets/views

I would love it if google_bigquery_dataset did not generate the unmanageable plans, as that seems to be the simplest fix and I like everything else about the resource. However, I would be very open to any solution that provides a solution to the problem (i.e. an authoritative way to manage BigQuery dataset access that supports authorized datasets/views and does not generate horrendous plans when a single change to access is made).

mwstanleyft commented 8 months ago

I agree that all 5 options are not really suitable for the reasons outlined.

There is one potential workaround, which is to use project-level bindings, which are authoritative, and attach conditions to them so that access is only available through certain datasets. An example would look like

resource "google_project_iam_binding" "staging-dataset-access" {
  project = "my-gcp-project"
  role    = "roles/bigquery.dataViewer"
  members = ["user:my_account@example.com"]

  condition {
    title       = "staging_only"
    description = "Grants access to the dataset Staging"
    expression  = "resource.type == \"bigquery.googleapis.com/Dataset\" && resource.name == \"projects/my-gcp-project/datasets/staging\""
  }
}

But this is in pre-GA and has a variety of caveats and limitations as discussed in the docs for conditional access.

rileykarson commented 6 months ago

Internal related: b/296451143

hashicorp / terraform-provider-google