sysdiglabs / terraform-google-secure-for-cloud

Terraform module that deploys the Sysdig Secure For Cloud stack in Google Cloud Platform
https://registry.terraform.io/modules/sysdiglabs/secure-for-cloud/google/latest
Apache License 2.0
2 stars 11 forks source link

Bug: Organization-level cloud-bench integration failure #144

Closed jdomeracki closed 1 year ago

jdomeracki commented 1 year ago

Hi Team, I'm 90% sure that I found a logic bug in the cloud-bench Terraform module which fails when using the organization-org_compliance example.

TL;DR The workaround proposed in the FAQ doesn't seem to work

Example instance of the module

locals {
  projects = ["ABC", "DEF"]
}

# Based on: https://registry.terraform.io/modules/sysdiglabs/secure-for-cloud/google/latest/examples/organization-org_compliance
provider "sysdig" {
  sysdig_secure_url = "https://sample.sysdig.com"
}

provider "google" {
  alias   = "multiproject"
  project = "sysdig-pov"
  region  = "us-central1"
}

provider "google-beta" {
  alias   = "multiproject"
  project = "sysdig-pov"
  region  = "us-central1"
}

module "secure-for-cloud_example_organization-org_compliance" {
  providers = {
    google.multiproject      = google.multiproject
    google-beta.multiproject = google-beta.multiproject
  }
  source                = "sysdiglabs/secure-for-cloud/google//examples/organization-org_compliance"
  version               = "0.9.9"
  organization_domain   = "sample.com"
  deploy_benchmark      = true
  benchmark_project_ids = local.projects
}

Terraform Plan

│ Error: Invalid index
│ 
│   on .terraform/modules/secure-for-cloud_example_organization-org_compliance/modules/services/cloud-bench-workload-identity/trust_relationship/main.tf line 27, in resource "sysdig_secure_cloud_account" "cloud_account":
│   27:   account_id                      = var.project_id_number_map[each.key]
│     ├────────────────
│     │ each.key is "ABC"
│     │ var.project_id_number_map is map of string with 1 element
│ 
│ The given key does not identify an element in this collection value.

Faulty logic

data "google_projects" "all_projects" {
  filter = "parent.id:${data.google_organization.org.org_id} parent.type:organization lifecycleState:ACTIVE"
}

locals {
  # If specific projects are specified, use that list. Otherwise, use all active projects in the org
  project_ids = length(var.project_ids) == 0 ? [for p in data.google_projects.all_projects.projects : p.project_id] : var.project_ids

  # Fetch both the project ID and project number (Needed by Workload Identity Federation)
  project_id_to_number_map = { for p in data.google_projects.all_projects.projects : p.project_id => p.number }
}

https://github.com/sysdiglabs/terraform-google-secure-for-cloud/blob/master/modules/services/cloud-bench-workload-identity/main.tf#L5-L15

  1. The list data.google_projects.all_projects.projects consist only of Projects whose direct parent is the Org itself (which in general is an antipattern)
  2. If var.project_ids = benchmark_project_ids isn't empty then project_ids = var.project_ids (which I assume was meant as the workaround)
  3. The issue stems from the fact that project_id_to_number_map operates on the data.google_projects.all_projects.projects list regardless wether benchmark_project_ids was provided or not
  4. This in consequence makes the project_number lookup fail since project_ids != [for p in data.google_projects.all_projects.projects : p.project_id]
resource "sysdig_secure_cloud_account" "cloud_account" {
  for_each                        = toset(local.project_ids)
  account_id                      = var.project_id_number_map[each.key]
  alias                           = each.key
  cloud_provider                  = "gcp"
  role_enabled                    = "true"
  role_name                       = var.role_name
  workload_identity_account_id    = var.project_id_number_map[var.project_id]
  workload_identity_account_alias = var.project_id
}

https://github.com/sysdiglabs/terraform-google-secure-for-cloud/blob/master/modules/services/cloud-bench-workload-identity/trust_relationship/main.tf#L25-L34

nkraemer-sysdig commented 1 year ago

Hi @jdomeracki thank you for raising this, and the thorough explanation!

This issue has been addressed in https://github.com/sysdiglabs/terraform-google-secure-for-cloud/pull/145, and released in version v0.9.10

Please don't hesitate to re-open this issue if the latest version of the module does not work as expected.

jdomeracki commented 1 year ago

Hi @nkraemer-sysdig, finally got around to testing the solution.

TL;DR It works ie. when a list of project IDs is provided the plan looks healthy 🎉

Having said that I do have a potential improvement to contribute. I timed the execution of the fetch-gcp-projects.sh script and it turned out to be painfully slow when run in an Org of our size:

> time ./fetch-gcp-projects.sh $ORG_ID
OMITTED
./fetch-gcp-projects.sh $ORG_ID  66.88s user 17.15s system 29% cpu 4:46.98 total

While it's pretty much a one time job and the logic looks fine there is an alternative Google API - Asset Inventory which makes the process of getting all Projects under given Org much easier. Listed below is an example gcloud query which achieves exactly that:

> time gcloud asset search-all-resources \
        --scope="organizations/$ORG_ID" \
        --asset-types="cloudresourcemanager.googleapis.com/Project" \
        --format="value(additionalAttributes.projectId)"
OMITTED
gcloud asset search-all-resources --scope="organizations/$ORG_ID" 1.66s user 0.21s system 6% cpu 27.990 total

Not only is it much faster but also the complexity and readability of the script could be greatly improved.