terraform-google-modules / terraform-google-lb-http

Creates a global HTTP load balancer for Compute Engine by using forwarding rules
https://registry.terraform.io/modules/terraform-google-modules/lb-http/google
Apache License 2.0
312 stars 356 forks source link

Integration test suite does not handle project creation failures #64

Closed jeremywadsack closed 4 years ago

jeremywadsack commented 4 years ago

If an error occurs while setting up the integration tests, the script apparently tries to delete the project before creating it again, but it doesn't work:

module.project-ci-lb-http.module.project-factory.data.null_data_source.default_service_account: Refreshing state...
module.project-ci-lb-http.module.project-factory.google_project.main: Destroying... [id=ci-int-lb-http-ef72]
module.project-ci-lb-http.module.project-factory.google_project.main: Destruction complete after 4s
module.project-ci-lb-http.module.project-factory.google_project.main: Creating...

Error: error creating project ci-int-lb-http-ef72 (ci-int-lb-http): googleapi: Error 409: Requested entity already exists, alreadyExists. If you received a 403 error, make sure you have the `roles/resourcemanager.projectCreator` permission

I think this happens because Google keeps the project around for a while for "undelete".

To fix this I had to destroy the random number from terraform:

docker run --rm -it -e SERVICE_ACCOUNT_JSON -e TF_VAR_org_id -e TF_VAR_folder_id -e TF_VAR_billing_account -v "$(pwd)":/workspace gcr.io/cloud-foundation-cicd/cft/developer-tools:0.4.2 bash -c 'cd test/setup; /usr/local/bin/execute_with_credentials.sh terraform destroy --target module.project-ci-lb-http.module.project-factory.random_id.random_project_id_suffix'

Then I could re-run the make docker_test_prepare command.

I don't know enough about the test suite. Is this a problem in here or should I post this issue to terraform-google-modules/terraform-google-project-factory?

morgante commented 4 years ago

@jeremywadsack What command caused the destruction? I don't think anything in the test suite will attempt to recreate a failed project automatically.

jeremywadsack commented 4 years ago

@morgante Running make docker_test_integration after an error (such as the permissions or API errors that I updated in documentation for in #65) caused the destruction and re-creation of the GCP project.

I couldn't figure out why it would delete the project resource either. Maybe terraform identified it as existing but not complete or ready? In providers/google/google_project, the project_id attribute says "Changing this forces a new project to be created." Not sure if that helps or if it is a red herring.

morgante commented 4 years ago

The weird thing is that make docker_test_integration shouldn't be creating projects at all: project creation is handled in the prepare command. Any chance you have a copy of the log output?

jeremywadsack commented 4 years ago

Sorry, you're correct there. It's the make docker_test_prepare command in both cases. (I updated the original description above to reflect that.)

morgante commented 4 years ago

Got it. In that case, I guess it's "expected behavior" that you cannot recreate a project with the same ID. I'm not sure what would've caused destruction in the first place though, and it'll be hard to debug unless you can provide a Terraform plan.

jeremywadsack commented 4 years ago

Full log of a run (with some redactions):

$ make docker_test_prepare
docker run --rm -it \
        -e SERVICE_ACCOUNT_JSON \
        -e TF_VAR_org_id \
        -e TF_VAR_folder_id \
        -e TF_VAR_billing_account \
        -v "/Source/terraform-google-lb-http":/workspace \
        gcr.io/cloud-foundation-cicd/cft/developer-tools:0.4.2 \
        /usr/local/bin/execute_with_credentials.sh prepare_environment
Updated property [core/pass_credentials_to_gsutil].
Activated service account credentials for: [XXXXXXX.iam.gserviceaccount.com]
Updated property [core/pass_credentials_to_gsutil].
Activated service account credentials for: [XXXXXXX.iam.gserviceaccount.com]
Initializing modules...

Initializing the backend...

Initializing provider plugins...

The following providers do not have any version constraints in configuration,
so the latest version was installed.

To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.

* provider.null: version = "~> 2.1"
* provider.random: version = "~> 2.2"

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
module.project-ci-lb-http.module.project-factory.random_id.random_project_id_suffix: Refreshing state... [id=VnE]
module.project-ci-lb-http.module.project-factory.null_resource.shared_vpc_subnet_invalid_name[0]: Refreshing state... [id=905541678552877136]
module.project-ci-lb-http.module.project-factory.null_resource.preconditions: Refreshing state... [id=5575721642416739487]
module.project-ci-lb-http.module.gsuite_group.data.google_organization.org[0]: Refreshing state...
module.project-ci-lb-http.module.project-factory.google_project.main: Refreshing state... [id=ci-int-lb-http-5671]
module.project-ci-lb-http.module.project-factory.data.null_data_source.default_service_account: Refreshing state...
module.project-ci-lb-http.module.project-factory.google_project.main: Destroying... [id=ci-int-lb-http-5671]
module.project-ci-lb-http.module.project-factory.google_project.main: Destruction complete after 4s
module.project-ci-lb-http.module.project-factory.google_project.main: Creating...

Error: error creating project ci-int-lb-http-5671 (ci-int-lb-http): googleapi: Error 409: Requested entity already exists, alreadyExists. If you received a 403 error, make sure you have the `roles/resourcemanager.projectCreator` permission

  on .terraform/modules/project-ci-lb-http/terraform-google-modules-terraform-google-project-factory-f93d3cd/modules/core_project_factory/main.tf line 126, in resource "google_project" "main":
 126: resource "google_project" "main" {

make: *** [docker_test_prepare] Error 1
jeremywadsack commented 4 years ago

And here is the terraform plan [UPDATED]:

module.project-ci-lb-http.module.project-factory.null_resource.shared_vpc_subnet_invalid_name[0]: Refreshing state... [id=905541678552877136]
module.project-ci-lb-http.module.project-factory.null_resource.preconditions: Refreshing state... [id=5575721642416739487]
module.project-ci-lb-http.module.gsuite_group.data.google_organization.org[0]: Refreshing state...
module.project-ci-lb-http.module.project-factory.random_id.random_project_id_suffix: Refreshing state... [id=VpY]
module.project-ci-lb-http.module.project-factory.google_project.main: Refreshing state... [id=ci-int-lb-http-5696]
module.project-ci-lb-http.module.project-factory.data.null_data_source.default_service_account: Refreshing state...

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create
-/+ destroy and then create replacement
 <= read (data resources)

Terraform will perform the following actions:

  # google_project_iam_member.int_test[0] will be created
  + resource "google_project_iam_member" "int_test" {
      + etag    = (known after apply)
      + id      = (known after apply)
      + member  = (known after apply)
      + project = "ci-int-lb-http-5696"
      + role    = "roles/owner"
    }

  # google_project_iam_member.int_test[1] will be created
  + resource "google_project_iam_member" "int_test" {
      + etag    = (known after apply)
      + id      = (known after apply)
      + member  = (known after apply)
      + project = "ci-int-lb-http-5696"
      + role    = "roles/storage.admin"
    }

  # google_service_account.int_test will be created
  + resource "google_service_account" "int_test" {
      + account_id   = "ci-int-lb-http"
      + display_name = "ci-int-lb-http"
      + email        = (known after apply)
      + id           = (known after apply)
      + name         = (known after apply)
      + project      = "ci-int-lb-http-5696"
      + unique_id    = (known after apply)
    }

  # google_service_account_key.int_test will be created
  + resource "google_service_account_key" "int_test" {
      + id                      = (known after apply)
      + key_algorithm           = "KEY_ALG_RSA_2048"
      + name                    = (known after apply)
      + private_key             = (sensitive value)
      + private_key_encrypted   = (known after apply)
      + private_key_fingerprint = (known after apply)
      + private_key_type        = "TYPE_GOOGLE_CREDENTIALS_FILE"
      + public_key              = (known after apply)
      + public_key_type         = "TYPE_X509_PEM_FILE"
      + service_account_id      = (known after apply)
      + valid_after             = (known after apply)
      + valid_before            = (known after apply)
    }

  # module.project-ci-lb-http.module.project-factory.data.null_data_source.default_service_account will be read during apply
  # (config refers to values not yet known)
 <= data "null_data_source" "default_service_account"  {
      + has_computed_default = (known after apply)
      + id                   = (known after apply)
      + inputs               = {
          + "email" = (known after apply)
        }
      + outputs              = (known after apply)
      + random               = (known after apply)
    }

  # module.project-ci-lb-http.module.project-factory.google_project.main is tainted, so must be replaced
-/+ resource "google_project" "main" {
      ~ app_engine          = [] -> (known after apply)
        auto_create_network = false
        billing_account     = "018226-824C2D-4372EC"
        folder_id           = "65631194888"
      ~ id                  = "ci-int-lb-http-5696" -> (known after apply)
      - labels              = {} -> null
        name                = "ci-int-lb-http"
      ~ number              = "1073268911910" -> (known after apply)
      + org_id              = (known after apply)
      + policy_data         = (known after apply)
      + policy_etag         = (known after apply)
        project_id          = "ci-int-lb-http-5696"
      + skip_delete         = (known after apply)
    }

  # module.project-ci-lb-http.module.project-factory.google_project_service.project_services[0] will be created
  + resource "google_project_service" "project_services" {
      + disable_dependent_services = true
      + disable_on_destroy         = true
      + id                         = (known after apply)
      + project                    = "ci-int-lb-http-5696"
      + service                    = "cloudresourcemanager.googleapis.com"
    }

  # module.project-ci-lb-http.module.project-factory.google_project_service.project_services[1] will be created
  + resource "google_project_service" "project_services" {
      + disable_dependent_services = true
      + disable_on_destroy         = true
      + id                         = (known after apply)
      + project                    = "ci-int-lb-http-5696"
      + service                    = "storage-api.googleapis.com"
    }

  # module.project-ci-lb-http.module.project-factory.google_project_service.project_services[2] will be created
  + resource "google_project_service" "project_services" {
      + disable_dependent_services = true
      + disable_on_destroy         = true
      + id                         = (known after apply)
      + project                    = "ci-int-lb-http-5696"
      + service                    = "serviceusage.googleapis.com"
    }

  # module.project-ci-lb-http.module.project-factory.google_project_service.project_services[3] will be created
  + resource "google_project_service" "project_services" {
      + disable_dependent_services = true
      + disable_on_destroy         = true
      + id                         = (known after apply)
      + project                    = "ci-int-lb-http-5696"
      + service                    = "compute.googleapis.com"
    }

  # module.project-ci-lb-http.module.project-factory.google_service_account.default_service_account will be created
  + resource "google_service_account" "default_service_account" {
      + account_id   = "project-service-account"
      + display_name = "ci-int-lb-http Project Service Account"
      + email        = (known after apply)
      + id           = (known after apply)
      + name         = (known after apply)
      + project      = "ci-int-lb-http-5696"
      + unique_id    = (known after apply)
    }

Plan: 10 to add, 0 to change, 1 to destroy.

------------------------------------------------------------------------

Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.ntly run.
morgante commented 4 years ago

So I do see this from the plan, but it's unclear what could've caused that:

  # module.project-ci-lb-http.module.project-factory.google_project.main is tainted, so must be replaced
-/+ resource "google_project" "main" {
      ~ app_engine          = [] -> (known after apply)
        auto_create_network = false
        billing_account     = "018226-824C2D-4372EC"
        folder_id           = "65631194888"
      ~ id                  = "ci-int-lb-http-5696" -> (known after apply)
      - labels              = {} -> null
        name                = "ci-int-lb-http"
      ~ number              = "1073268911910" -> (known after apply)
      + org_id              = (known after apply)
      + policy_data         = (known after apply)
      + policy_etag         = (known after apply)
        project_id          = "ci-int-lb-http-5696"
      + skip_delete         = (known after apply)
    }

Any chance you manually tainted the project?

jeremywadsack commented 4 years ago

I didn't do anything on this project. I just ran the docker_test_prepare task with the Service Usage API disabled in the service account's project to get it into this state. This is a brand new project that the task created (as per above, I removed the random number resource to have it create a new resource).

morgante commented 4 years ago

Unfortunately I'm not able to reproduce and now that we have documentation hopefully nobody else will see it happen again.

Just to confirm: everything is working for you now? Assuming yes, I'm going to close this.

jeremywadsack commented 4 years ago

I've got everything working, yes.

morgante commented 4 years ago

If anyone else encounters this, please let us know and we'll reopen.