terraform-google-modules / terraform-example-foundation

Shows how the CFT modules can be composed to build a secure cloud foundation
https://cloud.google.com/architecture/security-foundations
Apache License 2.0
1.18k stars 702 forks source link

fix(VPCSC): enable dryrun mode #1210

Closed eeaton closed 1 week ago

eeaton commented 2 months ago

Address issue #1209 . CI tests have high flaky failure rate due in part to propagation delays with VPCSC perimeter. Changing this to dry mode better adheres to best practices for enablement and will avoid this issue in CI tests.

eeaton commented 2 months ago

Thanks Daniel & Amanda for the input.

ACK to add the changes consistently across step 4-projects, simple project module, and hub-and-spoke flow Will add directions on the Readme about what to set for VPCSC.

re: the flag, like our discussion over chat I realize now we need separate input variables, because it is a valid use case for customers to set the enforced perimeter and dryrun perimeter to separate configurations (they are two different resources managed by the same CFT module). I'll draft this proposal and update the PR.

eeaton commented 2 months ago

Quick update: I'm blocked by a provider error on CFT. https://github.com/terraform-google-modules/terraform-google-project-factory/issues/904

The project factory module has partially designed support for an argument vpc_service_control_attach_dry_run but doesn't complete the implementation, it's not actually exposed to the Terraform Google Provider.

I've experimented locally with a few different patterns consistently that set the perimeter to dry_run mode in the 3-networks and 4-projects stage, but until the provider is fixed I don't think I can implement it using the current project factory module.

daniel-cit commented 2 months ago

Quick update: I'm blocked by a provider error on CFT. terraform-google-modules/terraform-google-project-factory#904

The project factory module has partially designed support for an argument vpc_service_control_attach_dry_run but doesn't complete the implementation, it's not actually exposed to the Terraform Google Provider.

I've experimented locally with a few different patterns consistently that set the perimeter to dry_run mode in the 3-networks and 4-projects stage, but until the provider is fixed I don't think I can implement it using the current project factory module.

@eeaton new release for the project factory module v15.0.0

eeaton commented 2 months ago

I've made a few new commits, but no need to do a full review yet. I've got a working version of configuring the dry-run perimeter in my local environment for the dualsvpc pattern, but haven't tested it fully for hub-and-spoke (just copied off the key differences manually). I want to run the full CI tests to identify any configuration errors I might have introduced.

Once that is working well I'll add the readme changes and then ask for a review

daniel-cit commented 1 month ago

Dual Shared VPC build

Step #17 - "verify-networks":           Error:          Received unexpected error:
Step #17 - "verify-networks":                           FatalError{Underlying: error while running command: exit status 1; 
Step #17 - "verify-networks":                           Error: Output "restricted_access_level_name" not found
Step #17 - "verify-networks":                           
Step #17 - "verify-networks":                           The output variable requested could not be found in the state file. If you
Step #17 - "verify-networks":                           recently added this to your configuration, be sure to run `terraform apply`,
Step #17 - "verify-networks":                           since the state won't be updated with new output variables until that command
Step #17 - "verify-networks":                           is run.}
Step #17 - "verify-networks":           Test:           TestNetworks/nonproduction
eeaton commented 1 month ago

Thanks for helping identify the issue Daniel. I've fixed the tests to check for the correct variable name (I hadtweaked a few variables in code that I found to be misleading variable names but didn't apply the same changes to variables that the tests expected)

eeaton commented 2 weeks ago

Thanks @daniel-cit for the comments on readme and naming consistency.

I have a few more functional issues that I'm trying to make pass the CI tests, I'll tag you for a functional review once that is completed

daniel-cit commented 2 weeks ago

/gcbrun

daniel-cit commented 2 weeks ago

/gcbrun

eeaton commented 2 weeks ago

Looks like CI is repeatedly failing here, but I haven't been able to identify a more specific issue in the logs regarding what parts of the TestOrg specifically are failing:

Step #8 - "verify-org": FAIL Step #8 - "verify-org": FAIL github.com/terraform-google-modules/terraform-example-foundation/test/integration/org 228.598s Step #8 - "verify-org": FAIL

daniel-cit commented 2 weeks ago

@eeaton It looks like the verification is failing due to changes in the Terraform state.
The test framework checks (terraform plan) if there is any change in the Terraform state (Error 2) after the original apply.

Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185: Note: Objects have changed outside of Terraform
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185: 
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185: Terraform detected the following changes made outside of Terraform since the
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185: last "terraform apply" which may have affected this plan:
... 
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185: Terraform used the selected providers to generate the following execution
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185: plan. Resource actions are indicated with the following symbols:
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:   ~ update in-place
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185: 
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185: Terraform will perform the following actions:
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185: 
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:   # module.cai_monitoring.module.cloud_function.google_cloudfunctions2_function.function will be updated in-place
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:   ~ resource "google_cloudfunctions2_function" "function" {
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:         id               = "projects/t7i-c-scc-ub3o/locations/us-central1/functions/caiMonitoring"
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:         name             = "caiMonitoring"
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:         # (11 unchanged attributes hidden)
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185: 
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:       ~ service_config {
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:           ~ environment_variables            = {
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:               - "LOG_EXECUTION_ID" = "true" -> null <===
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:                 # (2 unchanged elements hidden)
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:             }
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:             # (14 unchanged attributes hidden)
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:         }
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185: 
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:         # (2 unchanged blocks hidden)
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185:     }
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185: 
Step #8 - "verify-org": TestOrg 2024-06-28T11:37:22Z command.go:185: Plan: 0 to add, 1 to change, 0 to destroy.
Step #8 - "verify-org":     terraform.go:504: 
Step #8 - "verify-org":             Error Trace:    /builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/blueprint-test@v0.15.1/pkg/tft/terraform.go:504
Step #8 - "verify-org":                                         /workspace/test/integration/org/org_test.go:107
Step #8 - "verify-org":                                         /builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/blueprint-test@v0.15.1/pkg/tft/terraform.go:605
Step #8 - "verify-org":                                         /builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/blueprint-test@v0.15.1/pkg/tft/terraform.go:637
Step #8 - "verify-org":                                         /builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/blueprint-test@v0.15.1/pkg/utils/stages.go:31
Step #8 - "verify-org":                                         /builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/blueprint-test@v0.15.1/pkg/tft/terraform.go:637
Step #8 - "verify-org":             Error:          Should not be: 2
Step #8 - "verify-org":             Test:           TestOrg

the attribute with the changed value environment_variables.LOG_EXECUTION_ID is not declared in the foundation or in the Cloud Function module we are using, it looks like it was added by the API itself.

We may try to setup

environment_variables ={
"LOG_EXECUTION_ID" = "true"
}

to see if this will solve the issue.

I will do a local test to see if this will solve the problem.

eeaton commented 2 weeks ago

Got it, I've started seeing the same error on the other PR about fixing the SA, even though I don't expect that either PR changes things that should impact the TestOrg step, so that does suggest it's a change from the API itself that will break the CI on every branch.

eeaton commented 2 weeks ago

The tests in networks_test.go passed, but now CI failed for a similar test in projects_tests.go. I've applied similar logic that we used in networks_test.go, to check project membership against the dry-run diff object.

eeaton commented 1 week ago

@daniel-cit Woo hoo, all tests are finally green! Thanks again for all the help with this PR.

When you have a chance can you review and approve please?

daniel-cit commented 1 week ago

@daniel-cit Woo hoo, all tests are finally green! Thanks again for all the help with this PR.

When you have a chance can you review and approve please?

I will do a deploy to test the upgrade path https://github.com/eeaton/terraform-example-foundation/blob/fix-1209-vpcsc-dryrun/3-networks-dual-svpc/README.md#optional-enforce-vpc-service-controls

eeaton commented 1 week ago

@apeabody can I please get your formal review as a code owner to merge? I've already done a detailed functional review with Daniel

daniel-cit commented 1 week ago

/gcbrun