Closed lpezet closed 1 week ago
Submitted PR #1273 to fix this issue.
@lpezet can you please confirm whether you're still seeing the issue after this change?
@eeaton The behavior I mentioned in #1273 was happening before and after implementing the fix from #1206. I'll re-run it as soon as I get the chance (been busy) but if I can confirm my fix does address the issue, I'd love to find a way to add that in the tests (is it possible to "delay"/slow down group creation before the seed project configuration?).
@eeaton It's proving difficult to destroy everything 0-bootstrap created. I only provided the minimum (org_id, billing_account, groups object, default_region* and gh_repos information in terraform.tfvars
) and I now realize I should have looked at bucket_tfstate_kms_force_destroy
and bucket_force_destroy
variables as well to make it possible to redo this whole process again and again (something I wanted to do from the beginning).
Now running into issues like:
│ Error: error loading state: Failed to open state file at gs://bkt-prj-b-seed-tfstate-XXXX/terraform/bootstrap/state/default.tfstate: googleapi: got HTTP response code 403 with body: <?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Permission denied on Cloud KMS key. Please ensure that your Cloud Storage service account has been authorized to use this key.</Message></Error>
If you have any tips on what to specify/do at the beginning to be able to go through 0-bootstrap and then destroy everything cleanly to repeat, please let me know so I can use that next time.
I'd love to find a way to add that in the tests (is it possible to "delay"/slow down group creation before the seed project configuration?).
In general yes, and we do have a number of sleep timers and retry logic where resources aren't available to reference on GCP immediately after terraform apply commands. However, I subsequently added some details to #1206 that identifies the root cause as a permissions issue, so I don't think adding more sleep timers would make a difference here.
│ Error: error loading state: Failed to open state file at gs://bkt-prj-b-seed-tfstate-XXXX/terraform/bootstrap/state/default.tfstate: googleapi: got HTTP response code 403 with body: <?xml version='1.0' encoding='UTF-8'?>
AccessDenied
Permission denied on Cloud KMS key. Please ensure that your Cloud Storage service account has been authorized to use this key.
From the error, you might have cryptoshredded yourself (deleting the encryption key makes resources completely inaccessible).
A few things to try:
bucket_tfstate_kms_force_destroy and bucket_force_destroy ... any tips
parent_folder
to a unique folder each time for isolated instances of the foundation deployed to that folder, instead of at org node. also create_unique_tag_key to true to avoid global clash at orgtrue
In general yes, and we do have a number of sleep timers and retry logic where resources aren't available to reference on GCP immediately after terraform apply commands. However, I subsequently added some details to #1206 that identifies the root cause as a permissions issue, so I don't think adding more sleep timers would make a difference here.
I meant it as a way to confirm this is an issue by adding sleep timer(s) in the test (when creating required groups) to see if the seed_bootstrap module breaks with the error I experienced (thereby replicating my situation). This is a race condition in the end, isn't it? Then test with my fix to see whether it ]addresses the issue or not. That's what I meant. Sorry for the confusion.
I did cryptoshred myself, didn't I? lol Thanks for the tips.
From the discussion 1206 I don't think this is a race condition, it looks like different permissions applied when the service account creates groups (service account automatically gets OWNER permission on the Cloud Identity resources) vs when the user manually creates groups on Cloud Identity admin console (service account doesn't have any permissions for Cloud Identity, which manages permissions outside of GC IAM policies). I've made it a backlog item to improve the overall guidance to steer people away from this edge case in a future release.
I'll close this issue for now, but feel free to re-open if you disagree.
@eeaton My bad. I was referring to this issue, #1272, and NOT #1206 all this time. When you said:
1273 has been merged, but from reading the details of #1206 I'm not certain whether this solves the issue. (...)
@lpezet can you please confirm whether you're still seeing the issue after this change?
I thought you meant whether fix #1273 addressed this issue #1272, based on what was said in #1206. I can confirm fix #1273 worked for me but I would have liked to contribute a way to effectively test fix #1273 but I don't fully understand how the tests work and couldn't find anything relevant at first sight in test/integration/bootstrap/bootstrap_test.go.
Got it, thanks for clarifying.
If you're interested, here's a codelab introducing the test framework used by this repo and others based off of CFT: https://codelabs.developers.google.com/cft-onboarding
For this particular repo, though, I think running all the tests locally is an unreasonable burden for contributors trying to make a small fix. (Even assuming everything goes smoothly, it takes multiple hours to deploy all the infra and run tests and tear down again)
When you raise a PR, all the tests run on the backend before it can be approved to merge. My practical rule of thumb for this enormous repo is to run the minimum locally: make docker_test_lint
and make docker generate_docs
to catch obvious issues, then leave the detailed tests to the CI workflow triggered on a PR.
TL;DR
I'm going through https://github.com/terraform-google-modules/terraform-example-foundation/blob/master/0-bootstrap/README-GitHub.md. When running either step 21 or 31 (if letting the pipeline create the groups), the following error can (did) happen (I did obfuscate values, using example.com and fake org id):
Expected behavior
Running
terraform apply
only once.Observed behavior
Going through https://github.com/terraform-google-modules/terraform-example-foundation/blob/master/0-bootstrap/README-GitHub.md, I had this issue at step
23. Run terraform apply.
I re-ran it and it went fine. I encountered issue #1206 and after running through fix https://github.com/terraform-google-modules/terraform-example-foundation/issues/1206#issuecomment-2082315445, step31. The Pull request will trigger...
gave the same error.Terraform Configuration
Terraform Version
Additional information
I believe the fix (I'll propose one) is for
module.seed_bootstrap
to depend onmodule.required_group
, so that groups are created first beforeterraform-google-modules/bootstrap/google
(module.seed_bootstrap) execute thegoogle_organization_iam_member
resources.