google / exposure-notifications-server

Exposure Notification Reference Server | Covid-19 Exposure Notifications
https://www.google.com/covid19/exposurenotifications/
Apache License 2.0
2.45k stars 313 forks source link

Error waiting to create Connector google_vpc_access_connector #639

Closed madhavajay closed 4 years ago

madhavajay commented 4 years ago

So I have tried deploying with terraform to a brand new GCE project.

I am using the following commit: ccc44a91dbb9624fbb064e500175399e2bb2b3dc

I have made the change to the database.tf file to fix the error from here: https://github.com/google/exposure-notifications-server/issues/542

But I am still getting the following other error:

Error: Error running command './../scripts/build': exit status 1. Output: ✋ Uncommitted local changes!

Error: Error waiting to create Connector: Error waiting for Creating Connector: Error code 7, message: Operation failed: Google APIs Service Agent (<PROJECT_NUMBER>@cloudservices.gserviceaccount.com) needs editor role in the project.

  on main.tf line 77, in resource "google_vpc_access_connector" "connector":
  77: resource "google_vpc_access_connector" "connector" {

However I can't seem to get the VPC error to go away.

I can confirm the @cloudservices.gserviceaccount.com account has editor permissions, and I tried adding owner just in case. From the definition:

resource "google_vpc_access_connector" "connector" {
  project       = data.google_project.project.project_id
  name          = "serverless-vpc-connector"
  region        = var.network_location
  network       = "default"
  ip_cidr_range = "10.8.0.0/28"

  depends_on = [
    google_project_service.services["compute.googleapis.com"],
    google_project_service.services["vpcaccess.googleapis.com"],
  ]
}

I can see that it uses vpcaccess.googleapis.com and when I visit that in the GCE: https://console.cloud.google.com/apis/api/vpcaccess.googleapis.com/overview

It says: To use this API, you may need credentials. Click 'Create credentials' to get started.

When I follow the wizard it says that: "Are you planning to use this API with App Engine or Compute Engine?" that its not needed. Of course I assume because this is CloudRun it doesnt apply?

Either way, I tried creating a service account with that page and then applying it with:

$ gcloud auth activate-service-account --key-file=./service-account.json

Every time I delete the failed serverless VPC Connector and try again i get the same error.

Does anyone know what is wrong or have any idea about what the solution could be?

My colleague who has the absolute highest of the organisation permissions possible for the account tried in a new project with the same error, so I assume its not with my account permissions.

zssz commented 4 years ago

I'm having this issue too.

Related: Manually creating a new Serverless VPC access Connector is failing too, for my billing-enabled project.

Steps to reproduce:

  1. Go to https://console.cloud.google.com/networking/connectors/
  2. Delete the serverless-vpc-connector if there is one.
  3. Click CREATE CONNECTOR.
  4. Fill out the for with Name: serverless-vpc-connector, Network: default, IP Range:10.8.0.0, Minimum throughput: 200, Maximum throughput: 1000.
  5. Click Create.

Expected result: The new connector named serverless-vpc-connector is created.

Actual result: The new connector named serverless-vpc-connector is not created and there's a ❗️ next to it in the list. On hover, the displayed error message is "Connector is in a bad state, manual deletion is recommended".

sethvargo commented 4 years ago

Hi @zssz and @madhavajay

Do you have any organizational policies or restrictions? The cloudservices account is automatically created and managed by Google (see more). If you deleted or modified those service accounts, most Google Cloud services will no longer work.

The first error is because your git tree is dirty. You must commit all files before you can build. This is to prevent dirty state from making its way into a build.

The second error is a bit bizarre. How have you authenticated to run Terraform? Are you using gcloud? Which commands have you run?

If you have other subnets or networks in the project, it's possible that you're overlapping IP space.

sethvargo commented 4 years ago

A few more questions:

  1. Have you changed the region in the configurations? Only the these regions are allowed.
  2. Do you restrict the list of available GCE images? You must grant your project permission to use Compute Engine VM images from the project with ID serverless-vpc-access-images (see more)
madhavajay commented 4 years ago

@sethvargo To answer your questions: 1) The project location is: us-central1 2) command is: "terraform apply" after "terraform init" 3) See screenshot: https://imgur.com/a/tdFEMs2 The git tree has changes due to the above mentioned db issue patch. 71f5ac30922182cb5ed98610753ddff6343c42cb

This is a brand new project so we haven't configured anything special. Additionally, earlier versions from about 2 weeks ago, of this repo deployed without these issues to another project.

Also, just to note the database gets created without any issue so the general permission stuff isnt a problem.

See VPC error here: https://imgur.com/a/N2LWOhf

And these are the two error logs, I replaced the project id and project name and my email.

{
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "status": {
      "code": 3,
      "message": "BAD_REQUEST"
    },
    "authenticationInfo": {
      "principalEmail": "service-xxx@gcp-sa-vpcaccess.iam.gserviceaccount.com"
    },
    "requestMetadata": {
      "callerIp": "2002:a2b:fc53:0:b029:46:4ff5:a590",
      "callerSuppliedUserAgent": "Tesseract Google-API-Java-Client Google-HTTP-Java-Client/1.26.0-SNAPSHOT (gzip)"
    },
    "serviceName": "deploymentmanager.googleapis.com",
    "methodName": "v2.deploymentmanager.deployments.insert",
    "resourceName": "projects/PROJECT_ID/global/deployments/aet-uscentral1-serverless--vpc--connector",
    "request": {
      "@type": "type.googleapis.com/deploymentmanager.deployments.insert"
    }
  },
  "insertId": "-8blnlec3ig",
  "resource": {
    "type": "deployment",
    "labels": {
      "project_id": "PROJECT_ID",
      "name": "aet-uscentral1-serverless--vpc--connector"
    }
  },
  "timestamp": "2020-06-18T01:52:04.054Z",
  "severity": "ERROR",
  "logName": "projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity",
  "operation": {
    "id": "operation-1592445020054-5a851ffbea2ca-75973959-c9fb40a7",
    "producer": "deploymentmanager.googleapis.com",
    "last": true
  },
  "receiveTimestamp": "2020-06-18T01:52:05.065320568Z"
}
{
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "status": {
      "code": 7,
      "message": "Operation failed: Google APIs Service Agent (<PROJECT_NUMBER>@cloudservices.gserviceaccount.com) needs editor role in the project."
    },
    "authenticationInfo": {
      "principalEmail": "[my email address]"
    },
    "requestMetadata": {
      "requestAttributes": {},
      "destinationAttributes": {}
    },
    "serviceName": "vpcaccess.googleapis.com",
    "methodName": "google.cloud.vpcaccess.v1.VpcAccessService.CreateConnector",
    "resourceName": "projects/PROJECT_ID/locations/us-central1/connectors/serverless-vpc-connector"
  },
  "insertId": "jzr661bkq",
  "resource": {
    "type": "audited_resource",
    "labels": {
      "service": "vpcaccess.googleapis.com",
      "method": "google.cloud.vpcaccess.v1.VpcAccessService.CreateConnector",
      "project_id": "PROJECT_ID"
    }
  },
  "timestamp": "2020-06-18T01:52:09.346Z",
  "severity": "ERROR",
  "logName": "projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity",
  "operation": {
    "id": "projects/PROJECT_ID/locations/us-central1/operations/5943cf0d-c9ca-4366-a57d-c674f569b3c2",
    "producer": "vpcaccess.googleapis.com",
    "last": true
  },
  "receiveTimestamp": "2020-06-18T01:52:09.571477817Z"
}
sethvargo commented 4 years ago

Brand new projects are still covered under organizational policies. This is difficult to debug because it works without errors on my two test domains.

madhavajay commented 4 years ago

@sethvargo Thanks for your help debugging this.

Do you know what organisational policy is preventing us from making a serverless VPC Connector and why would that be a default? This whole org was created purely for this COVID project so while there's a chance something is mis-configured, I would err on the side of assuming this is default settings all the way. Does the use of this repo require some other configuration I have missed around allowing some special VPC images / policies, and why is that happening now, was there no VPC connector in earlier versions as they worked without this error. Should we contact support inside GCE? Any chance you could help escalate, as this is preventing us from updating our apps to match the latest server implementation over at: https://www.covid-watch.org/

madhavajay commented 4 years ago

Weird. Looks like @zssz solved it.

Seems we can only create the VPC connector with max_throughput = 300 and then everything works fine.

Any idea what thats about?

diff --git a/terraform/main.tf b/terraform/main.tf
index 75f2b38..ca11378 100644
--- a/terraform/main.tf
+++ b/terraform/main.tf
@@ -80,6 +80,7 @@ resource "google_vpc_access_connector" "connector" {
   region        = var.network_location
   network       = "default"
   ip_cidr_range = "10.8.0.0/28"
+  max_throughput= 300

   depends_on = [
     google_project_service.services["compute.googleapis.com"],
Apply complete! Resources: 20 added, 0 changed, 1 destroyed.

Outputs:

appengine_location = us-central
cloudrun_location = us-central1
cloudscheduler_location = us-central1
db_conn = xxx:us-central1:en-xxx
db_location = us-central1
db_name = main
db_pass_secret = projects/xxx/secrets/db-password/versions/1
db_user = notification
kms_location = us-central1
network_location = us-central1
project_id = xxx
project_number = xxx
region = us-central1

🎉

madhavajay commented 4 years ago

@sethvargo I made a PR: https://github.com/google/exposure-notifications-server/pull/644

I left the default at 1000 which is what it is without specifying and left our setting to 300 as a comment in the commit / PR for now.