terraform-google-modules / terraform-example-foundation

Shows how the CFT modules can be composed to build a secure cloud foundation
https://cloud.google.com/architecture/security-foundations
Apache License 2.0
1.23k stars 721 forks source link

Error: Invalid template interpolation value - var.python_interpreter_path #236

Closed kbroughton closed 4 years ago

kbroughton commented 4 years ago

I believe all the steps 0-3 are now completed

I'm hitting the following error in gcp-projects - plan/production branch running as org admin++ - basically my superuser.

Error: Invalid template interpolation value
Step #2 - "tf plan": 
Step #2 - "tf plan":   on .terraform/modules/restricted_shared_vpc_project.project/modules/core_project_factory/locals.tf line 30, in locals:
Step #2 - "tf plan":   30:   preconditions_command = "${var.python_interpreter_path} ${local.preconditions_py_absolute_path} %{for key, value in local.attributes}--${key}=\"${value}\" %{endfor}"
Step #2 - "tf plan": 
Step #2 - "tf plan": The expression result is null. Cannot include a null value in a string
Step #2 - "tf plan": template.

This appears to be coming from

cat ./0-bootstrap/.terraform/modules/seed_bootstrap.seed_project/modules/core_project_factory/locals.tf

locals {
  root_path                      = abspath(path.root)
  preconditions_path             = join("/", [local.root_path, path.module, "scripts", "preconditions"])
  pip_requirements_absolute_path = join("/", [local.preconditions_path, "requirements.txt"])
  preconditions_py_absolute_path = join("/", [local.preconditions_path, "preconditions.py"])
  attributes = {
    billing_account             = var.billing_account
    org_id                      = var.org_id
    credentials_path            = var.credentials_path
    impersonate_service_account = var.impersonate_service_account
    folder_id                   = var.folder_id
    shared_vpc                  = var.shared_vpc
  }
  preconditions_command = "${var.python_interpreter_path} ${local.preconditions_py_absolute_path} %{for key, value in local.attributes}--${key}=\"${value}\" %{endfor}"
}

If i git clone the terraform-google-project-factory and cd to modules/core_project_factory/scripts/preconditions I can run the test locally. With only the required BILLING_ACCOUNT and ORG_ID, the function returns nothing. Is that success or failure?

Adding in the optional variables except impersonate_service_account and credentials path I get

python preconditions.py --org_id $ORG_ID --billing_account $BILLING_ACCOUNT --shared_vpc $SHARED_VPC_D --folder_id $CFT_FOLDER
[
    {
        "type": "Required APIs on service account project",
        "name": "projects/cft-seed-XXXX",
        "satisfied": [
            "iam.googleapis.com",
            "cloudbilling.googleapis.com",
            "admin.googleapis.com",
            "cloudresourcemanager.googleapis.com"
        ],
        "unsatisfied": []
    },
    {
        "type": "Service account permissions on billing account",
        "name": "billingAccounts/XXXXX-YYYYYY-ZZZZZZ",
        "satisfied": [
            "billing.resourceAssociations.create"
        ],
        "unsatisfied": []
    },
    {
        "type": "Service account permissions on host VPC project",
        "name": "vpc-d-shared-base",
        "satisfied": [],
        "unsatisfied": [
            "resourcemanager.projects.setIamPolicy"
        ]
    },
    {
        "type": "Service account permissions on parent folder",
        "name": "folders/$FOLDER_ID",
        "satisfied": [
            "resourcemanager.projects.create"
        ],
        "unsatisfied": []
    },
    {
        "type": "Service account permissions on organization",
        "name": "organizations/$ORG_ID",
        "satisfied": [
            "compute.subnetworks.setIamPolicy"
        ],
        "unsatisfied": [
            "compute.organizations.enableXpnResource"
        ]
    }
]

This suggests it might be using the application_default_credentials.json. Perhaps I need to impersonate? However, that gives

python preconditions.py --org_id $ORG_ID --billing_account $BILLING_ACCOUNT \
          --shared_vpc $SHARED_VPC_D --folder_id $CFT_FOLDER \
          --impersonate_service_account $SERVICE_ACCOUNT
Traceback (most recent call last):
  File "preconditions.py", line 492, in <module>
    retcode = main(sys.argv)
  File "preconditions.py", line 468, in main
    opts.impersonate_service_account)
TypeError: cannot unpack non-iterable Credentials object

using
$ echo $SERVICE_ACCOUNT
org-terraform@cft-seed-XXXX.iam.gserviceaccount.com

It's not clear to me if I should be using the cft-seed SA or the cft-cloudbuild SA, or if my superuser credentials are getting downscoped during application_default_credentials creation. I tried regenerating the SA default creds gcloud auth application-default with and without --impersonate-service-account but there were other errors that route.

Any help would be appreciated. I tried turning on --verbose for the preconditions.py tool, hoping to get verbose sdk output, but that didn't work. I'm curious which credentials are being used in the checks and how to generate the correct creds.

rjerrems commented 4 years ago

Hi @kbroughton thanks for reporting the issue, assuming you are running locally - can you please provide details of the following to help debug the issue?

It is a little odd that you would have only seen issues in the projects stage if steps 0-3 executed successfully, as they should be using the exact same version of the project factory module.

kbroughton commented 4 years ago

To be clear, i'm following the cloudbuild instructions, however, 0-bootstrap and 3-networks/envs/shared seemed to require local terraform apply.

Terraform v0.12.29

gcloud version Google Cloud SDK 307.0.0 bq 2.0.59 core 2020.08.21 gsutil 4.53 OS 10.15.6

No substantial changes to the foundation code. Only *.tfvars changes. The only other thing I can think of is that after running the 3-network/envs/shared on my laptop (there didn't seem to be another option) with Terraform 0.12.29 installed, I went to do the 3-network/envs/non-prod/prod/dev using the git/cloudbuild flow but got errors about some tfstate being created with 0.12.29 but the current version was 0.12.24. I tracked that down to the following

0-bootstrap/.terraform/modules/seed_bootstrap/modules/cloudbuild/cloudbuild_builder/Dockerfile ARG TERRAFORM_VERSION=0.12.24

The above is available after terraform init. So I built it locally and pushed that to the cft-cloudbuild/terraform repo overwriting the latest tag. This fixed the issue and I can't imagine that causing the permissions issues.

bharathkkb commented 4 years ago

Hi @kbroughton, a quick clarification

I believe all the steps 0-3 are now completed

Did this stage previously run and apply fine before?

kbroughton commented 4 years ago

It was a bit rocky, but I did get successful terraform apply for stage and development on 0-3 (locally for 0-bootstrap and 3-network/envs/shared), cloudbuild for the rest. The errors I'm facing now are from 4-projects. I've never had a working run for them.

kbroughton commented 4 years ago

Digging a little more on my issues deploying 4-projects...

I can inspect the cloudbuild run by inspecting the Dockerfile from 0-bootstrap above. I see from the Dockerfile that the entrypoint is /builder/entrypoint.bash which I can inspect inside the container.

4-projects-customized$ docker run -it --entrypoint '' -v $PWD:/data  gcr.io/cft-cloudbuild-bae1/terraform  bash

$ cat /builder/entrypoint.bash

active_account=""
function get-active-account() {
  active_account=$(gcloud auth list --filter=status:ACTIVE --format="value(account)" 2> /dev/null)
}

function activate-service-key() {
  rootdir=/root/.config/gcloud-config
  mkdir -p $rootdir
  tmpdir=$(mktemp -d "$rootdir/servicekey.XXXXXXXX")
  trap "rm -rf $tmpdir" EXIT
  echo ${GCLOUD_SERVICE_KEY} | base64 --decode -i > ${tmpdir}/gcloud-service-key.json
  gcloud auth activate-service-account --key-file ${tmpdir}/gcloud-service-key.json --quiet
  get-active-account
}
<snip>

So cloudbuild runs this container and grabs GCLOUD_SERVICE_KEY from the env. The cloudbuild run is failing on the plan step, so I presume that is 4-projects/cloudbuild-tf-plan.yaml shown below.

timeout: 1200s
substitutions:
  _POLICY_REPO: '' # add path to policies here https://github.com/forseti-security/policy-library/blob/master/docs/user_guide.md#how-to-use-terraform-validator
steps:
- id: 'setup'
  name: gcr.io/$PROJECT_ID/terraform
  entrypoint: /bin/bash
  args:
  - -c
  - |
    echo "Setting up gcloud for impersonation"
    gcloud config set auth/impersonate_service_account ${_TF_SA_EMAIL}
    echo "Adding bucket information to backends"
    for i in `find -name 'backend.tf'`; do sed -i 's/UPDATE_ME/${_STATE_BUCKET_NAME}/' $i; done

# [START tf-plan_validate_all]
- id: 'tf plan validate all'
  name: gcr.io/${PROJECT_ID}/terraform
  entrypoint: /bin/bash
  args:
  - -c
  - |
      ./tf-wrapper.sh plan_validate_all ${BRANCH_NAME} ${_POLICY_REPO}

artifacts:
  objects:
    location: 'gs://${_ARTIFACT_BUCKET_NAME}/terraform/cloudbuild/plan/${BUILD_ID}'
    paths: ['cloudbuild-tf-plan.yaml', 'tmp_plan/*.tfplan']

I can see the "echo" statements above in the cloudbuild log. In the GCP Console, in CloudBuild | Build Details | Execution Details | User Substitutions, I can see some of the environment variables such as _TF_SA_EMAIL which looks right (but not GCLOUD_SERVICE_KEY or PROJECT_ID.

In the Console under CloudBuild | (4-projects build) | Settings I see the following

Cloud Build executes builds with the permissions granted to the Cloud Build service account tied to the project. You can grant additional roles to the service account to allow Cloud Build to interact with other GCP services.

Service account email: 7155XXXXXXX@cloudbuild.gserviceaccount.com

All the GCP services listed below the SA email (such as Compute, CloudBuild) are disabled. The 7155XX SA is not in the gcloud projects get-iam-policy for my prj-p-shared-base-YYYY project. However, it is in the org policy.

- members:
  - serviceAccount:7155XXXXXX@cloudbuild.gserviceaccount.com
  role: roles/serviceusage.serviceUsageConsumer

That doesn't seem enough alone to grant permissions to do terraform stuff. Also, the above SA does not match the value I set in "common.auto.tfvars"

terraform_service_account = "org-terraform@cft-seed-ZZZZ.iam.gserviceaccount.com"

The "plan" step fails with the following:

The root module does not declare a variable named "python_interpreter_path"
but a value was found in file "common.auto.tfvars". To use this value, add a
"variable" block to the configuration.

Using a variables file to set an undeclared variable is deprecated and will
become an error in a future release. If you wish to provide certain "global"
settings to all configurations in your organization, use TF_VAR_...
environment variables to set these instead.

Error: Invalid template interpolation value

  on .terraform/modules/base_shared_vpc_project.project/modules/core_project_factory/locals.tf line 30, in locals:
- "tf plan":   30:   preconditions_command = "${var.python_interpreter_path} ${local.preconditions_py_absolute_path} %{for key, value in local.attributes}--${key}=\"${value}\" %{endfor}"
 - "tf plan": 

However, the variable is defined in many places such as 2-environments-modified/envs/production/.terraform/modules/env.restricted_shared_vpc_host_project/variables.tf with a default value of "python3". Attempting to avoid this error, I added python_interpreter_path = "python3" to common.auto.tfvars and shared.auto.tfvars. The error remained.

The error seemed to be complaining about base_shared_vpc_project.project which was set up in 3-networks, so I went back to that project, added python_interpreter_path = "python3" to shared.auto.tfvars and common.auto.tfvars and pushed on production branch.

After that I triggered 4-projects-customized and much to my surprise, it worked. The only other change I made was in CloudBuild I enabled ServiceAccounts, CloudBuild and Compute services. I'll leave this here in case it helps someone else debug their own issue.

In summary, I think the necessary changes were

rjerrems commented 4 years ago

Thanks for the additional context @kbroughton - given we don't have a super clear idea of the root issue & fix, I am going to close this for the time being.