Closed kbroughton closed 4 years ago
Hi @kbroughton thanks for reporting the issue, assuming you are running locally - can you please provide details of the following to help debug the issue?
It is a little odd that you would have only seen issues in the projects stage if steps 0-3 executed successfully, as they should be using the exact same version of the project factory module.
To be clear, i'm following the cloudbuild instructions, however, 0-bootstrap and 3-networks/envs/shared seemed to require local terraform apply.
Terraform v0.12.29
gcloud version Google Cloud SDK 307.0.0 bq 2.0.59 core 2020.08.21 gsutil 4.53 OS 10.15.6
No substantial changes to the foundation code. Only *.tfvars changes. The only other thing I can think of is that after running the 3-network/envs/shared on my laptop (there didn't seem to be another option) with Terraform 0.12.29 installed, I went to do the 3-network/envs/non-prod/prod/dev using the git/cloudbuild flow but got errors about some tfstate being created with 0.12.29 but the current version was 0.12.24. I tracked that down to the following
0-bootstrap/.terraform/modules/seed_bootstrap/modules/cloudbuild/cloudbuild_builder/Dockerfile ARG TERRAFORM_VERSION=0.12.24
The above is available after terraform init. So I built it locally and pushed that to the cft-cloudbuild/terraform repo overwriting the latest tag. This fixed the issue and I can't imagine that causing the permissions issues.
Hi @kbroughton, a quick clarification
I believe all the steps 0-3 are now completed
Did this stage previously run and apply fine before?
It was a bit rocky, but I did get successful terraform apply
for stage and development on 0-3 (locally for 0-bootstrap and 3-network/envs/shared), cloudbuild for the rest. The errors I'm facing now are from 4-projects. I've never had a working run for them.
Digging a little more on my issues deploying 4-projects...
I can inspect the cloudbuild run by inspecting the Dockerfile from 0-bootstrap above. I see from the Dockerfile that the entrypoint is /builder/entrypoint.bash which I can inspect inside the container.
4-projects-customized$ docker run -it --entrypoint '' -v $PWD:/data gcr.io/cft-cloudbuild-bae1/terraform bash
$ cat /builder/entrypoint.bash
active_account=""
function get-active-account() {
active_account=$(gcloud auth list --filter=status:ACTIVE --format="value(account)" 2> /dev/null)
}
function activate-service-key() {
rootdir=/root/.config/gcloud-config
mkdir -p $rootdir
tmpdir=$(mktemp -d "$rootdir/servicekey.XXXXXXXX")
trap "rm -rf $tmpdir" EXIT
echo ${GCLOUD_SERVICE_KEY} | base64 --decode -i > ${tmpdir}/gcloud-service-key.json
gcloud auth activate-service-account --key-file ${tmpdir}/gcloud-service-key.json --quiet
get-active-account
}
<snip>
So cloudbuild runs this container and grabs GCLOUD_SERVICE_KEY from the env. The cloudbuild run is failing on the plan step, so I presume that is 4-projects/cloudbuild-tf-plan.yaml shown below.
timeout: 1200s
substitutions:
_POLICY_REPO: '' # add path to policies here https://github.com/forseti-security/policy-library/blob/master/docs/user_guide.md#how-to-use-terraform-validator
steps:
- id: 'setup'
name: gcr.io/$PROJECT_ID/terraform
entrypoint: /bin/bash
args:
- -c
- |
echo "Setting up gcloud for impersonation"
gcloud config set auth/impersonate_service_account ${_TF_SA_EMAIL}
echo "Adding bucket information to backends"
for i in `find -name 'backend.tf'`; do sed -i 's/UPDATE_ME/${_STATE_BUCKET_NAME}/' $i; done
# [START tf-plan_validate_all]
- id: 'tf plan validate all'
name: gcr.io/${PROJECT_ID}/terraform
entrypoint: /bin/bash
args:
- -c
- |
./tf-wrapper.sh plan_validate_all ${BRANCH_NAME} ${_POLICY_REPO}
artifacts:
objects:
location: 'gs://${_ARTIFACT_BUCKET_NAME}/terraform/cloudbuild/plan/${BUILD_ID}'
paths: ['cloudbuild-tf-plan.yaml', 'tmp_plan/*.tfplan']
I can see the "echo" statements above in the cloudbuild log. In the GCP Console, in CloudBuild | Build Details | Execution Details | User Substitutions, I can see some of the environment variables such as _TF_SA_EMAIL which looks right (but not GCLOUD_SERVICE_KEY or PROJECT_ID.
In the Console under CloudBuild | (4-projects build) | Settings I see the following
Cloud Build executes builds with the permissions granted to the Cloud Build service account tied to the project. You can grant additional roles to the service account to allow Cloud Build to interact with other GCP services.
Service account email: 7155XXXXXXX@cloudbuild.gserviceaccount.com
All the GCP services listed below the SA email (such as Compute, CloudBuild) are disabled.
The 7155XX SA is not in the gcloud projects get-iam-policy
for my prj-p-shared-base-YYYY project. However, it is in the org policy.
- members:
- serviceAccount:7155XXXXXX@cloudbuild.gserviceaccount.com
role: roles/serviceusage.serviceUsageConsumer
That doesn't seem enough alone to grant permissions to do terraform stuff. Also, the above SA does not match the value I set in "common.auto.tfvars"
terraform_service_account = "org-terraform@cft-seed-ZZZZ.iam.gserviceaccount.com"
The "plan" step fails with the following:
The root module does not declare a variable named "python_interpreter_path"
but a value was found in file "common.auto.tfvars". To use this value, add a
"variable" block to the configuration.
Using a variables file to set an undeclared variable is deprecated and will
become an error in a future release. If you wish to provide certain "global"
settings to all configurations in your organization, use TF_VAR_...
environment variables to set these instead.
Error: Invalid template interpolation value
on .terraform/modules/base_shared_vpc_project.project/modules/core_project_factory/locals.tf line 30, in locals:
- "tf plan": 30: preconditions_command = "${var.python_interpreter_path} ${local.preconditions_py_absolute_path} %{for key, value in local.attributes}--${key}=\"${value}\" %{endfor}"
- "tf plan":
However, the variable is defined in many places such as
2-environments-modified/envs/production/.terraform/modules/env.restricted_shared_vpc_host_project/variables.tf
with a default value of "python3". Attempting to avoid this error, I added python_interpreter_path = "python3"
to common.auto.tfvars and shared.auto.tfvars. The error remained.
The error seemed to be complaining about base_shared_vpc_project.project which was set up in 3-networks, so I went back to that project, added python_interpreter_path = "python3"
to shared.auto.tfvars and common.auto.tfvars and pushed on production
branch.
After that I triggered 4-projects-customized and much to my surprise, it worked. The only other change I made was in CloudBuild I enabled ServiceAccounts, CloudBuild and Compute services. I'll leave this here in case it helps someone else debug their own issue.
In summary, I think the necessary changes were
Thanks for the additional context @kbroughton - given we don't have a super clear idea of the root issue & fix, I am going to close this for the time being.
I believe all the steps 0-3 are now completed
I'm hitting the following error in gcp-projects - plan/production branch running as org admin++ - basically my superuser.
This appears to be coming from
If i git clone the terraform-google-project-factory and cd to modules/core_project_factory/scripts/preconditions I can run the test locally. With only the required BILLING_ACCOUNT and ORG_ID, the function returns nothing. Is that success or failure?
Adding in the optional variables except impersonate_service_account and credentials path I get
This suggests it might be using the application_default_credentials.json. Perhaps I need to impersonate? However, that gives
It's not clear to me if I should be using the cft-seed SA or the cft-cloudbuild SA, or if my superuser credentials are getting downscoped during application_default_credentials creation. I tried regenerating the SA default creds
gcloud auth application-default
with and without --impersonate-service-account but there were other errors that route.Any help would be appreciated. I tried turning on --verbose for the preconditions.py tool, hoping to get verbose sdk output, but that didn't work. I'm curious which credentials are being used in the checks and how to generate the correct creds.