hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.29k stars 1.72k forks source link

google_project_iam_member/google_project_iam_binding Fails for roles/cloudsql.client, Works for Other #5107

Closed jjorissen52 closed 4 years ago

jjorissen52 commented 4 years ago

Community Note

Terraform Version

Terraform v0.12.10
+ provider.google v2.20.0
+ provider.local v1.4.0
+ provider.null v2.1.2

Affected Resource(s)

Terraform Configuration Files

# resources fail, but the below gcloud command succeeds
gcloud projects add-iam-policy-binding booklawyer-dev-259701 /
    --member serviceAccount:sql-client@booklawyer-dev-259701.iam.gserviceaccount.com \
    --role roles/cloudsql.client
resource "google_service_account" "sql_client" {
    account_id = "sql-client"
    display_name  = "sql-client"    
}

# fails
resource "google_project_iam_member" "sql_client" {
    role = "roles/cloudsql.client"

    member = "serviceAccount:${google_service_account.sql_client.email}"
    # these roles have to be created before they can be assigned 
    depends_on = ["google_service_account.sql_client"]
}

# also fails
# resource "google_project_iam_binding" "sql_client" {
#     role = "roles/cloudsql.client"

#     members = ["serviceAccount:${google_service_account.sql_client.email}", ]
#     # these roles have to be created before they can be assigned 
#     depends_on = ["google_service_account.sql_client"]
# }

resource "google_storage_bucket" "etl_config" {
  bucket_policy_only = true
  force_destroy      = true
  name               = "etl-config-${var.project.number}"
  requester_pays     = false
  storage_class      = "STANDARD"
}

# succeeds
resource "google_storage_bucket_iam_member" "etl_config" {
  bucket = "${google_storage_bucket.etl_config.name}"
  role        = "roles/storage.objectAdmin"
  member      = "serviceAccount:${google_service_account.sql_client.email}"
}

Debug Output

https://gist.github.com/jjorissen52/d253d274cdb763b47b55cbe3ee0f19e2

Expected Behavior

Binding should happen

Actual Behavior

Error: Batch "iam-project-booklawyer-dev-259701 modifyIamPolicy" for request "Create IAM Members roles/cloudsql.client serviceAccount:sql-client@booklawyer-dev-259701.iam.gserviceaccount.com for \"project \\\"booklawyer-dev-259701\\\"\"" returned error: Error applying IAM policy for project "booklawyer-dev-259701": Error setting IAM policy for project "booklawyer-dev-259701": googleapi: Error 400: Request contains an invalid argument., badRequest

  on ../etl/iam.tf line 20, in resource "google_project_iam_member" "sql_client":
  20: resource "google_project_iam_member" "sql_client" {

Steps to Reproduce

  1. terraform apply

Important Factoids

I have been able to use this exact resource setup to apply other roles to other service accounts.

References

Resolution here does not seem to work:

morgante commented 4 years ago

I've been noticing the same error across many different projects as of today:

For example, this config is causing this error:

Step #0 - "prepare": Error: Batch "iam-project-ci-gcloud-b081 modifyIamPolicy" for request "Create IAM Members roles/owner serviceAccount:ci-account@ci-gcloud-b081.iam.gserviceaccount.com for \"project \\\"ci-gcloud-b081\\\"\"" returned error: Error applying IAM policy for project "ci-gcloud-b081": Error setting IAM policy for project "ci-gcloud-b081": googleapi: Error 400: Policy members must be of the form "<type>:<value>"., badRequest
Step #0 - "prepare": 
Step #0 - "prepare":   on iam.tf line 29, in resource "google_project_iam_member" "int_test":
Step #0 - "prepare":   29: resource "google_project_iam_member" "int_test" {
Step #0 - "prepare": 

The error is quite confusing, because serviceAccount:ci-account@ci-gcloud-b081.iam.gserviceaccount.com looks valid as an IAM member to me.

I think the right fix is likely to filter out deleted principles when sending the IAM policy back.

morgante commented 4 years ago

I've been doing a bit more investigation into this (tracked in #333). I've been able to consistently reproduce it on my project, here are the debug logs.

Looking at the logs, I suspect the issue is related to deleted IAM principles. Specifically, I see that we attempt to reflect a deleted IAM principle back in the setPolicy response.

I've also done some version testing:

  1. It does not occur on 2.12.0
  2. It does occur on 2.13.0
  3. It does occur on 3.1.0

Right now the best workaround I can find is to pin the provider to ~> 2.12.0.

slevenick commented 4 years ago

I've got a fix for this on the way: https://github.com/GoogleCloudPlatform/magic-modules/pull/2819

slevenick commented 4 years ago

As a workaround until the fix is released you can delete service account IAM members with the deleted: prefix and terraform will work as usual.

This issue is caused specifically by deleted service accounts that exist on the resource that terraform is managing members on, so removing references to them will allow terraform to work normally.

slevenick commented 4 years ago

This fix is available now in the 2.20.1 version of the provider, and will be available for 3.x in the 3.3.0 release expected next week.

jjorissen52 commented 4 years ago

@slevenick I've just attempted it after pinning v2.20.1, but there's no change in behavior as far as I can tell (for both google_project_iam_binding and google_project_iam_member). Any advice for me?

Terraform v0.12.10
+ provider.archive v1.3.0
+ provider.google v2.20.1
+ provider.local v1.4.0
+ provider.null v2.1.2

image

slevenick commented 4 years ago

@jjorissen52 can you provide debug logs for the failing run? That will help me debug what is going on

jjorissen52 commented 4 years ago

@slevenick unfortunately, earlier today I bumped up to v3.2.0 on this project for an unrelated reason, and I am unable to downgrade again (trying to do so results in an error with terraform apply).

slevenick commented 4 years ago

The 3.3.0 release is expected to go out tomorrow which has this fix. Please let me know if you encounter the same issue with that version, but I'll close this until then.

I believe this issue has been fixed with 2.20.1 as I am unable to reproduce issues at this point

Downgrading from 3.x to 2.x is going to be difficult and not recommended

madmaze commented 4 years ago

I am definitely still encountering this issue with 2.20.1, ~is it possible that version does not yet include the fix?~ nvm, i checked the tag, the fix should be in there.

Error: Batch "iam-project-demo modifyIamPolicy" for request "Create IAM Members roles/stackdriver.resourceMetadata.writer serviceAccount:staging-cluster-sa@demo.iam.gserviceaccount.com for \"p
roject \\\"demo\\\"\"" returned error: Error applying IAM policy for project "demo": Error setting IAM policy for project "demo": googleapi: Error 400: Request contains an invalid argument., b
adRequest

  on .terraform/modules/gke_service_account/main.tf line 33, in resource "google_project_iam_member" "service_account-roles":
  33: resource "google_project_iam_member" "service_account-roles" {

I also upgraded everything to 3.3.0 and I'm still seeing that issue, if I blow everything away and go back to 2.12.0 everything still seems to work

lobsterdore commented 4 years ago

I have just tried this with version 3.4.0 and I am getting the same error, here's a code snippet:

resource "google_service_account" "cloud_sql" {
  account_id   = "dev-cloud-sql"
  display_name = "dev-cloud-sql"
}

resource "google_project_iam_binding" "cloud_sql_iam" {
  depends_on = [google_service_account.cloud_sql]
  role    = "roles/cloudsql.client"

  members = [
    "serviceAccount:${google_service_account.cloud_sql.email}"
  ]
}

Error output:

Error: Batch "iam-project-xxx modifyIamPolicy" for request "Set IAM Binding for role \"roles/cloudsql.client\" on \"project \\\"xxx\\\"\"" returned error: Error applying IAM policy for project "xxx": Error setting IAM policy for project "xxx": googleapi: Error 400: Request contains an invalid argument., badRequest

  on ../../../modules/db_database/main.tf line 20, in resource "google_project_iam_binding" "cloud_sql_iam":
  20: resource "google_project_iam_binding" "cloud_sql_iam" {
slevenick commented 4 years ago

@madmaze or @lobsterdore can you include a debug log for the failed apply?

I am able to apply the config provided with 3.3.0, but a debug log would help identify the issue

jjorissen52 commented 4 years ago

@slevenick , I just upgraded to v3.4.0 and can confirm that this is still affecting me. Debug Logs

Terraform v0.12.10
+ provider.archive v1.3.0
+ provider.google v3.4.0
+ provider.local v1.4.0
+ provider.null v2.1.2

terraform apply -target=module.booklawyer.module.etl.google_project_iam_binding.sql_client

Shows same error as before:

Error: Batch "iam-project-booklawyer-dev-259701 modifyIamPolicy" for request "Set IAM Binding for role \"roles/cloudsql.client\" on \"project \\\"booklawyer-dev-259701\\\"\"" returned error: Error applying IAM policy for project "booklawyer-dev-259701": Error setting IAM policy for project "booklawyer-dev-259701": googleapi: Error 400: Request contains an invalid argument., badRequest

  on ../etl/iam.tf line 12, in resource "google_project_iam_binding" "sql_client":
  12: resource "google_project_iam_binding" "sql_client" {

Debug Logs

slevenick commented 4 years ago

@jjorissen52 That is odd. Can you apply the same config on a new (clean) project?

I suspect that there is something strange happening with the IAM policy for your existing project. I believe this is an unrelated issue, but it presents with the same (not very helpful) error message.

Looking at the debug log, I would guess that this is causing the failure:

2020-01-07T15:36:29.562-0600 [DEBUG] plugin.terraform-provider-google_v3.4.0_x5:   {
2020-01-07T15:36:29.562-0600 [DEBUG] plugin.terraform-provider-google_v3.4.0_x5:    "role": "roles/owner",
2020-01-07T15:36:29.562-0600 [DEBUG] plugin.terraform-provider-google_v3.4.0_x5:    "members": [
2020-01-07T15:36:29.562-0600 [DEBUG] plugin.terraform-provider-google_v3.4.0_x5:     "user:",
2020-01-07T15:36:29.562-0600 [DEBUG] plugin.terraform-provider-google_v3.4.0_x5:     "user:",
2020-01-07T15:36:29.562-0600 [DEBUG] plugin.terraform-provider-google_v3.4.0_x5:     "user:",
2020-01-07T15:36:29.562-0600 [DEBUG] plugin.terraform-provider-google_v3.4.0_x5:     "user:"
2020-01-07T15:36:29.562-0600 [DEBUG] plugin.terraform-provider-google_v3.4.0_x5:    ]

Terraform receives an IAM policy that has a series of members named user: from the API. To my eye this looks blatantly wrong, and using the iam_binding resource within terraform attempts to preserve any existing members, so it posts the same series of user: members back.

I believe that removing these faulty members will cause terraform to succeed. Could you try either using the console or gcloud to remove these members, or using a project_iam_policy which is authoritative?

jjorissen52 commented 4 years ago

@slevenick Apologies, I manually modified those lines so as to not publish my co-workers email addresses. each of those lines once contained an valid-user@valid-domain.com. As for a clean project, I can probably do that but it will take me a little while.

slevenick commented 4 years ago

Ok that makes sense.

I'm back to being confused about why this is happening. This seems unrelated to the other issues around deleted: IAM members, though it started occurring at the same time. It could possibly be related to changes in the IAM API that happened around the filing date of this issue

Were you able to successfully apply this config with versions of the provider after 2.12.0 prior to filing this issue?

What I'm trying to figure out is if this broke with the 2.13.0 release or if the combination of 2.13.0+ and the API changes that happened around Dec 6th are causing it.

jjorissen52 commented 4 years ago

@slevenick I had never attempted this particular role assignment (roles/cloudsql.client) using a resource "google_project_iam_binding" "" {} block before on any version, but I do have a project that assigns a role which currently uses provider.google v2.16.0.

resource "google_project_iam_binding" "cloudbuild-sa-user" {
    project = "${google_project_services.project.project}"
    role    = "roles/iam.serviceAccountUser"

    members = [
      "serviceAccount:${local.cloud_build_sa}",
    ]
}

Unfortunately, I cannot tell if this is the version that was used when creating the binding or if I've since updated the version; the state history does not seem to contain information about provider versions.

madmaze commented 4 years ago

I have a debug log of both v2.12.0 and v2.20.1, are there any specific parts that would be most valuable to share? I'm hesitant to share the whole log, its full of seemingly sensitive info.

I've cleaned up two snippets, 2.12.0 & 2.20.1 which seem relevant to me. Looks like besides the order, the sent data is exactly the same besides the etag (2.12.0 json & 2.20.1 json) which I'm not sure whether that's supposed to change. https://gist.github.com/madmaze/ccda69be4ac861f6ac0fc15cdf9e8bf3

Two other differences seem to be in the headers:

slevenick commented 4 years ago

@madmaze those are helpful logs, but they don't seem to indicate what the issue is.

The nearly identical request failing is very strange. The change in etag between the two requests is expected as it is used for locking, and should change whenever the IAM policy is updated.

I'm asking around internally to try and track down an answer on this.

How are you resetting the IAM policy when you change the provider version? Could you include the config that you are using to reproduce this?

madmaze commented 4 years ago

I don't have access to the actual files right now, but here is the order of operations I performed:

  1. use config from this original issue, set provider to 2.12.0
  2. terraform init
  3. comment out the google_project_iam_member block
  4. terraform apply
  5. uncomment the google_project_iam_member block
  6. terraform apply
  7. grab log..
  8. terraform destroy
  9. update the provider version to 2.20.1, then repeat steps 1-7
michyliao commented 4 years ago

I am also seeing this issue when applying iam_member with provider.google: version = "~> 3.4"

Error: Batch "iam-project-<project id> modifyIamPolicy" for request "Create IAM Members roles/storage.objectAdmin serviceAccount:<service-account-id>@<project-id>.iam.gserviceaccount.com for \"project \\\"<projet-id>\\\"\"" returned error: Error applying IAM policy for project "<project-id>": Error setting IAM policy for project "<project-id>": googleapi: Error 400: The role name must be in the form "roles/{role}", "organizations/{organization_id}/roles/{role}", or "projects/{project_id}/roles/{role}"., badRequest

  role       = "roles/storage.objectAdmin"
  member     = "serviceAccount:${module.module-name.email}"

In the debug logs, I am seeing this: eval: *terraform.EvalMaybeTainted

slevenick commented 4 years ago

@michyliao that looks like a different issue. Can you file a separate issue with debug logs included?

slevenick commented 4 years ago

I'm unable to track this down by just the error message from the debug logs (invalid argument is very generic)

I'll probably need to be able to reproduce this to make further progress. @madmaze can you send me the full debug logs for a failing run? It would help to have the full request/response pair without any changes. If you don't want to post them publicly could you send them to my username @google.com

jjorissen52 commented 4 years ago

@slevenick It seems that, for the affected project, resource "google_project_iam_binding" always fails to apply. Should I update the title to more accurately describe the issue?

akrasnov-drv commented 4 years ago

Just today faced this bug and am very surprised that it's not fixed for months. After wasting several hours I found that member/binding functions fail when there is a user (in the project) with Capital letter(s) in its ID (email) Fortunately I had just 1 inactive user with Capital letters and I was able to remove it and apply my "google_project_iam_member" rules.

The error message " Error 400: Request contains an invalid argument., badReques" is misleading. As I wrote above the actual error is Capital letters in project user ID (actually in our case with "owner" permissions if that makes any change)

What's the most weird in this situation is that I can't add that user back with low case letters. Google checks the email I provide (lower case) in its user database(s) and adds it with Capital letters again.

Please fix. // Hope this message will save to someone his/her time

slevenick commented 4 years ago

Hey @akrasnov-drv sorry that this caused issues for you.

How are you adding back the user with lower case letters? Can you give me an overview of your workflow, like are you using terraform to attempt to add this user back, but it gets sent as lowercase@mail.com and comes back as LOWERCASE@mail.com?

akrasnov-drv commented 4 years ago

Hi @slevenick User creation is not actually relevant to the case. It's just another side effect that adds troubles. I created user in Google console (IAM). I specified lowercase useremail@gmail.com, and Google found it, but then it added the user as UserEmail@gmail.com (likely it was initially registered so in gmail by the user) The terraform google provider bug is that it can't work with such "unusually formatted" emails, and produces misleading error. I understand that RFC defines email addresses as case insensitive. But Google keeps it case sensitive, therefor google provider should support this too.

slevenick commented 4 years ago

Hm, can you provide debug logs for the failing run? I'm unable to create a user with capital letters in their name. I have created a user with capital letters, but the IAM console only finds it as lowercase, which doesn't cause any issues.

akrasnov-drv commented 4 years ago

Yes, sure. As I wrote before, Google provides the email it finds in its databases, and it keeps capital/lowercase as it's in its DB. I don't know if you can register new Google user with capital letters in email now, but it was definitely possible in the past.

Test code

  account_id = "del-me"
  display_name  = "bug test sa"
}
resource "google_project_iam_member" "bug_test_role" {
  role    = "roles/compute.instanceAdmin"
  member = "serviceAccount:${google_service_account.del_me.email}"
  depends_on = [google_service_account.del_me]
}

Error

google_project_iam_member.bug_test_role: Creating...

Error: Batch "iam-project-my-project modifyIamPolicy" for request "Create IAM Members roles/compute.instanceAdmin serviceAccount:del-me@my-project.iam.gserviceaccount.com for \"project \\\"my-project\\\"\"" returned error: Error applying IAM policy for project "my-project": Error setting IAM policy for project "my-project": googleapi: Error 400: Request contains an invalid argument., badRequest. To debug individual requests, try disabling batching: https://www.terraform.io/docs/providers/google/guides/provider_reference.html#enable_batching

  on security.tf line 19, in resource "google_project_iam_member" "bug_test_role":
  19: resource "google_project_iam_member" "bug_test_role" {

The log (attached, with some security related masking) is for google-beta but it fails the same way for google too.

akrasnov-drv commented 4 years ago

tf.log

slevenick commented 4 years ago

That's very unusual. How did you create the user with capital letters, is it just an old email that existed?

And you have found that removing the user with capital letters allows you to apply the binding?

I'll ask around for why the API would be returning upper case values and if this is intended we should handle this correctly in Terraform

akrasnov-drv commented 4 years ago

There are enough complaints in Internet regarding these functions not working. I believe all (or most) of them have this issue (user(s) with Upper case letter(s)). This should be handled by terraform provider. I do not believe Google will update it user databases (or API)...

slevenick commented 4 years ago

@jjorissen52 does your IAM policy have users with upper case letters?

I'm tracking down the intended behavior here, and will definitely handle this in the provider if needed

jjorissen52 commented 4 years ago

@slevenick The project does have one user with capital letters in the email, though none of bindings defined via terraform do anything with that user. Don't know if that makes a difference.

akrasnov-drv commented 4 years ago

Yes, I also do nothing with the problem user. But you can see it in debug and it brakes the workflow (I mean just existence of it). @josephlewis42 if you have an option to (temporary) remove that user, you'll see it fixes your terraform processing.

jjorissen52 commented 4 years ago

@akrasnov-drv @slevenick That was it.

  1. Run apply with the binding. Failure.
  2. Remove user with capital letters in their Gmail account from IAM via cloud console.
  3. Run apply with binding. Success!
slevenick commented 4 years ago

@akrasnov-drv thank you for figuring out the root cause of this issue!

I still cannot reproduce, but it seems like this is a (somewhat) common case, so I'll find a fix

lpzdvd-packlink commented 4 years ago

Ended here facing same issue. I was using google_project_iam_member as

foo@xxx.iam.gserviceaccount.com

fiexed using:

serviceAccount:foo@xxx.iam.gserviceaccount.com

It's in doc anyway.

slevenick commented 4 years ago

I'm still having trouble reproducing this issue, and I believe that there is something strange going on with the particular emails being used here as emails are not handled case sensitively by the API.

Can I have one of you @akrasnov-drv or @jjorissen52 send me the actual email that is causing the problems? It will help me track down what exactly about these users is causing the issue.

You can send it to my github username @google.com

akrasnov-drv commented 4 years ago

Hi, Have you seen email I sent you about a week ago? Any progress?

innovia commented 4 years ago

@slevenick

I've hit the same issue today running terraform gke public module

I believe that the issue happens when attempting to add a role to a new service account (existing policy), you have to first fetch the policy which includes the user with the capital letter, then append to it and apply it.

If you can point me to the code where this is done I can try to replicate it using gcloud CLI, and see if its an SKD issue or implementation issue (usually the SDK will make fixes to it before applying it)

update:

Im unable to replicate it on a single role, already containing a CamelCase user name, maybe its an issue with size of the payload?

resource "google_service_account" "sa" {
  account_id   = "terratest"
  display_name = "Terratest Service Account"
}

resource "google_project_iam_member" "log_writer" {
  project = "ami-playground"
  role    = "roles/logging.logWriter"
  member  = "serviceAccount:${google_service_account.sa.email}"
}
slevenick commented 4 years ago

Surprisingly I'm unable to reproduce this issue in my own project. If I add a user with a capital letter, it behaves the same way as in all of the cases described here, where Terraform lowercases any capital letters coming from the API, but in all of my cases the API accepts the lowercase version.

For example, the API will return:

"role": "roles/browser",
"members": [
"user:MyUser@gmail.com"
 ],

I add a binding with a different user, posting back a policy with

"role": "roles/browser",
"members": [
"user:myuser@gmail.com"
 ],

Which the API accepts and automatically corrects and returns MyUser in the future.

I'm trying to debug with the team internally, and may reach out to some of you for help in reproducing this for them

akrasnov-drv commented 4 years ago

@slevenick Thank you for the efforts :) Try using the user I sent you by mail. In my project it breaks binding functions with 100% consistency. I added and removed it already about 5-7 times. In my project this user has "owner" rights if it changes anything.

cee-dub commented 4 years ago

I was just experiencing what seems like a related issue to this and #4276 and was able to solve it. Maybe this can help others in the thread.

I have a resource "google_project_iam_custom_role", a data "google_iam_policy" (not certain this is required), and a resource "google_project_iam_member". The API was returning the error googleapi: Error 400: Role roles/myCustomRole is not supported for this resource., badRequest when trying to create the google_project_iam_member.

Returned the badRequest error:

resource "google_project_iam_member" "mem" {
  role   = "roles/${google_project_iam_custom_role.role.role_id}"
  member = "serviceAccount:${data.google_service_account.sa.email}"
}

Succeeded:

resource "google_project_iam_member" "mem" {
  role   = "projects/${var.project}/roles/${google_project_iam_custom_role.role.role_id}"
  member = "serviceAccount:${data.google_service_account.sa.email}"
}
slevenick commented 4 years ago

Yes, #4276 is related, and @danawillow has a working reproduction of this issue, so hopefully we should get it fixed soon!

I'll close this as a duplicate at this point as #4276 is the same issue

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ā³. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error šŸ¤– šŸ™‰ , please reach out to my human friends šŸ‘‰ hashibot-feedback@hashicorp.com. Thanks!