skyplane-project / skyplane

🔥 Blazing fast bulk data transfers between any cloud 🔥
https://skyplane.org
Apache License 2.0
1.09k stars 62 forks source link

[bug] GCP: Caller does not have required permission to use project #784

Open ephemer opened 1 year ago

ephemer commented 1 year ago

Describe the bug I am trying to set up Skyplane to copy from gs:// to s3://. Right after the "Installing Gateway Package" step, where the actual transfer progress appears to begin, I get a 403 error that Caller does not have required permission to use project <GCP PROJECT ID WHERE BUCKET IS LOCATED>. Grant the caller the roles/serviceusage.serviceUsageConsumer role, or a custom role with the serviceusage.services.use permission, by visiting https://console.developers.google.com/iam-admin/iam/project?project=<GCP PROJECT ID>.

My GCP credentials are linked to the account owner, and the Service Account was created by Skyplane itself. There is no indication of who the "caller" is, or to whom I need to grant those roles.

I already granted that role to both the account admin account (the one logged in to gcloud) and also to the Skyplane "manual" Service Account which was created during skyplane init. I waited a number of hours after doing that and tried again and it still isn't working. There is no other "caller" I can imagine granting permissions to.

To Reproduce Steps to reproduce the behavior (please include the full Skyplane command you ran):

  1. pip install skyplane[aws,gcp]
  2. skyplane init (aws and gcloud are already set up, gcloud is logged in and set to the correct project)
  3. Set gcp_instance_class to n2-standard-8 because we can't raise our quota to allow more than 8 N2 cores
    • This is why we're leaving GCP – we have been customers for years but they won't grant us quota for any GPUs or more than the default number of CPU cores, and won't tell us why not, other than "you need more billing history".
  4. skyplane cp --recursive --reuse-gateways --confirm gs://my_bucket s3://my-bucket
    • without --reuse-gateways I received timeouts waiting for the AWS EC2 instance to come online
  5. See error

Expected behavior The transfer works

Screenshots

❌ GCPServer(region_tag=gcp:us-west1-b, instance_name=skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422) encountered error:
Traceback (most recent call last):
  File "/pkg/skyplane/gateway/gateway_obj_store.py", line 49, in get_obj_store_interface
    self.obj_store_interfaces[key] = ObjectStoreInterface.create(region, bucket)
  File "/pkg/skyplane/obj_store/object_store_interface.py", line 102, in create
    return GCSInterface(bucket)
  File "/pkg/skyplane/obj_store/gcs_interface.py", line 26, in __init__
    self._gcs_client = self.auth.get_storage_client()
  File "/pkg/skyplane/utils/imports.py", line 33, in wrapped
    return fn(*modules_imported, *args, **kwargs)
  File "/pkg/skyplane/compute/gcp/gcp_auth.py", line 200, in get_storage_client
    return storage.Client.from_service_account_json(self.service_account_credentials)
  File "/pkg/skyplane/compute/gcp/gcp_auth.py", line 68, in service_account_credentials
    self._service_account_email = self.create_service_account(self.service_account_name)
  File "/pkg/skyplane/compute/gcp/gcp_auth.py", line 135, in create_service_account
    service_accounts = service.projects().serviceAccounts().list(name="projects/" + self.project_id).execute()["accounts"]
  File "/usr/local/lib/python3.10/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/googleapiclient/http.py", line 938, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://iam.googleapis.com/v1/projects/note-detection-277711/serviceAccounts?alt=json returned "Caller does not have required permission to use project note-detection-277711. Grant the caller the roles/serviceusage.serviceUsageConsumer role, or a custom role with the serviceusage.services.use permission, by visiting https://console.developers.google.com/iam-admin/iam/project?project=note-detection-277711 and then retry. Propagation of the new permission may take a few minutes.". Details: "[{'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Google developer console IAM admin', 'url': 'https://console.developers.google.com/iam-admin/iam/project?project=note-detection-277711'}]}, {'@type': 'type.googleapis.com/google.rpc.ErrorInfo', 'reason': 'USER_PROJECT_DENIED', 'domain': 'googleapis.com', 'metadata': {'service': 'iam.googleapis.com', 'consumer': 'projects/note-detection-277711'}}]">

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/pkg/skyplane/gateway/gateway_obj_store.py", line 121, in worker_loop
    obj_store_interface = self.get_obj_store_interface(chunk_req.src_region, bucket)
  File "/pkg/skyplane/gateway/gateway_obj_store.py", line 51, in get_obj_store_interface
    raise ValueError(f"Failed to create obj store interface {str(e)}")
ValueError: Failed to create obj store interface <HttpError 403 when requesting https://iam.googleapis.com/v1/projects/note-detection-277711/serviceAccounts?alt=json returned "Caller does not have required permission to use project note-detection-277711. Grant the caller the roles/serviceusage.serviceUsageConsumer role, or a custom role with the serviceusage.services.use permission, by visiting https://console.developers.google.com/iam-admin/iam/project?project=note-detection-277711 and then retry. Propagation of the new permission may take a few minutes.". Details: "[{'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Google developer console IAM admin', 'url': 'https://console.developers.google.com/iam-admin/iam/project?project=note-detection-277711'}]}, {'@type': 'type.googleapis.com/google.rpc.ErrorInfo', 'reason': 'USER_PROJECT_DENIED', 'domain': 'googleapis.com', 'metadata': {'service': 'iam.googleapis.com', 'consumer': 'projects/note-detection-277711'}}]">

Transfer client log In the log output from Skyplane, please upload the debug log from the CLI. You can find the path to the file in the log output:

23:31:23 [DEBUG] [AWS] Found existing rule for 0.0.0.0/0:22 in skyplane, not adding again
23:31:26 [DEBUG] [wait_for] Waiting fn=<function AWSCloudProvider.provision_instance.<locals>.check_iam_role at 0x157759af0> completed in 0.55s
23:31:26 [DEBUG] [wait_for] Waiting fn=<function AWSCloudProvider.provision_instance.<locals>.check_instance_profile at 0x16ba963a0> completed in 0.16s
23:31:34 [DEBUG] [wait_for] Wait for RUNNING status on gcp:us-west1-b:skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422 fn=<function GCPCloudProvider.provision_instance.<locals>.<lambda> at 0x1694c8940> completed in 1.84s
23:31:51 [DEBUG] [wait_for] Waiting for gcp:us-west1-b:skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422 to be ready fn=<function Server.wait_for_ssh_ready.<locals>.is_up at 0x1694c8940> completed in 16.97s
23:31:54 [DEBUG] [wait_for] Waiting for aws:eu-west-1:i-0a77ea9d3ca45909e to be ready fn=<function Server.wait_for_ssh_ready.<locals>.is_up at 0x168c111f0> completed in 0.43s
23:31:56 [DEBUG] [AWS] Adding IPs ['3.252.227.2', '35.197.85.49'] to security group skyplane
23:32:01 [DEBUG] [GCP] Created new firewall skyplane32522272
23:32:07 [DEBUG] [GCP] Created new firewall skyplane1013805
23:32:07 [WARN]  Using BBR, make sure you indend to!
23:32:07 [WARN]  Using BBR, make sure you indend to!
23:32:15 [DEBUG] Starting gateway aws:eu-west-1:i-0a77ea9d3ca45909e, host: 3.252.227.2 docker pull in 6.390626907348633
23:32:15 [DEBUG] Starting gateway aws:eu-west-1:i-0a77ea9d3ca45909e, host: 3.252.227.2: Starting gateway container
23:32:16 [INFO]  Starting gateway aws:eu-west-1:i-0a77ea9d3ca45909e, host: 3.252.227.2: sudo docker run -d --log-driver=local --log-opt max-file=16 --ipc=host --network=host --ulimit nofile=1048576 --mount type=tmpfs,dst=/skyplane,tmpfs-size=$(($(free -b  | head -n2 | tail -n1 | awk '{print $2}')/2)) -v /tmp/config:/pkg/data/config -v /tmp/e2ee_key:/pkg/data/e2ee_key --env SKYPLANE_IS_GATEWAY=1 --env SKYPLANE_CONFIG=/pkg/data/config --env AWS_METADATA_SERVICE_NUM_ATTEMPTS=4 --env AWS_METADATA_SERVICE_TIMEOUT=10 --env AWS_DEFAULT_REGION=eu-west-1 --env E2EE_KEY_FILE=/pkg/data/e2ee_key --name skyplane_gateway public.ecr.aws/s6m1p0n8/skyplane:0.2.1 /bin/bash -c "/etc/init.d/stunnel4 start && python -u /pkg/skyplane/gateway/gateway_daemon.py --chunk-dir /skyplane/chunks --outgoing-ports '{}' --region aws:eu-west-1 --use-compression  --disable-tls"
23:32:20 [DEBUG] Starting gateway aws:eu-west-1:i-0a77ea9d3ca45909e, host: 3.252.227.2: Gateway started 1db81823943c8a2446730972079e57bd49e43014adda08ef42c58f0067ceb7a1
23:32:20 [DEBUG] Starting gateway gcp:us-west1-b:skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, host: 35.197.85.49 docker pull in 7.647210121154785
23:32:20 [DEBUG] Starting gateway gcp:us-west1-b:skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, host: 35.197.85.49: Starting gateway container
23:32:20 [DEBUG] Bound remote port aws:eu-west-1:i-0a77ea9d3ca45909e:8888 to localhost:49209
23:32:20 [DEBUG] aws:eu-west-1:i-0a77ea9d3ca45909e log_viewer_url = http://127.0.0.1:49209/container/1db81823943c
23:32:21 [DEBUG] Bound remote port aws:eu-west-1:i-0a77ea9d3ca45909e:8081 to localhost:49211
23:32:21 [DEBUG] aws:eu-west-1:i-0a77ea9d3ca45909e gateway_api_url = http://127.0.0.1:49211
23:32:21 [DEBUG] [wait_for] Waiting for gateway aws:eu-west-1:i-0a77ea9d3ca45909e to start fn=<function Server.start_gateway.<locals>.is_api_ready at 0x16ba06d30> completed in 0.18s
23:32:26 [INFO]  Starting gateway gcp:us-west1-b:skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, host: 35.197.85.49: sudo docker run -d --log-driver=local --log-opt max-file=16 --ipc=host --network=host --ulimit nofile=1048576 --mount type=tmpfs,dst=/skyplane,tmpfs-size=$(($(free -b  | head -n2 | tail -n1 | awk '{print $2}')/2)) -v /tmp/config:/pkg/data/config -v /tmp/service_account_key.json:/pkg/data/service_account_key.json -v /tmp/e2ee_key:/pkg/data/e2ee_key --env SKYPLANE_IS_GATEWAY=1 --env SKYPLANE_CONFIG=/pkg/data/config --env GCP_SERVICE_ACCOUNT_FILE=/pkg/data/service_account_key.json --env E2EE_KEY_FILE=/pkg/data/e2ee_key --name skyplane_gateway public.ecr.aws/s6m1p0n8/skyplane:0.2.1 /bin/bash -c "/etc/init.d/stunnel4 start && python -u /pkg/skyplane/gateway/gateway_daemon.py --chunk-dir /skyplane/chunks --outgoing-ports '{\"3.252.227.2\": 32}' --region gcp:us-west1-b --use-compression  --disable-tls"
23:32:31 [DEBUG] Starting gateway gcp:us-west1-b:skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, host: 35.197.85.49: Gateway started 4cc85eb0d9b4ff850f8eeba4ff2bb15ece20fda5e75f906c2c12d84fc48af021
23:32:33 [DEBUG] Bound remote port gcp:us-west1-b:skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422:8888 to localhost:49215
23:32:33 [DEBUG] gcp:us-west1-b:skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422 log_viewer_url = http://127.0.0.1:49215/container/4cc85eb0d9b4
23:32:34 [DEBUG] Bound remote port gcp:us-west1-b:skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422:8081 to localhost:49217
23:32:34 [DEBUG] gcp:us-west1-b:skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422 gateway_api_url = http://127.0.0.1:49217
23:32:35 [DEBUG] [wait_for] Waiting for gateway gcp:us-west1-b:skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422 to start fn=<function Server.start_gateway.<locals>.is_api_ready at 0x16a040b80> completed in 0.67s
23:32:35 [INFO]  Log URLs for aws:eu-west-1:i-0a77ea9d3ca45909e (aws:eu-west-1:0)
23:32:35 [INFO]     Log viewer: http://127.0.0.1:49209/container/1db81823943c
23:32:35 [INFO]     API: http://127.0.0.1:49211
23:32:35 [INFO]  Log URLs for gcp:us-west1-b:skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422 (gcp:us-west1-b:0)
23:32:35 [INFO]     Log viewer: http://127.0.0.1:49215/container/4cc85eb0d9b4
23:32:35 [INFO]     API: http://127.0.0.1:49217
23:32:35 [DEBUG] initiate_multipart_transfers: 0.00s
23:32:35 [INFO]  Batch 0 size: 100954957834 with 169349 chunks
23:32:36 [DEBUG] Batch 0 size: 100954957834 with 169349 chunks
23:32:45 [DEBUG] Sent 16384 chunk requests to skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, 152965 remaining
23:32:49 [DEBUG] Sent 16384 chunk requests to skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, 136581 remaining
23:32:54 [DEBUG] Sent 16384 chunk requests to skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, 120197 remaining
23:32:58 [DEBUG] Sent 16384 chunk requests to skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, 103813 remaining
23:33:02 [DEBUG] Sent 16384 chunk requests to skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, 87429 remaining
23:33:06 [DEBUG] Sent 16384 chunk requests to skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, 71045 remaining
23:33:11 [DEBUG] Sent 16384 chunk requests to skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, 54661 remaining
23:33:15 [DEBUG] Sent 16384 chunk requests to skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, 38277 remaining
23:33:19 [DEBUG] Sent 16384 chunk requests to skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, 21893 remaining
23:33:24 [DEBUG] Sent 16384 chunk requests to skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, 5509 remaining
23:33:26 [DEBUG] Sent 5509 chunk requests to skyplane-gcp-c2f5abde739641a2ae6eb5d5bd7c6422, 0 remaining
23:33:26 [DEBUG] Building chunk requests: 50.15s

Environment info (please complete the following information):

ephemer commented 1 year ago

I just added the Service Usage Admin role to the skyplane service account and now I get a different error:

❌ GCPServer(region_tag=gcp:us-west1-b, instance_name=skyplane-gcp-410fdbe8814c4fb9b02f089f18d86144) encountered error:
Traceback (most recent call last):
  File "/pkg/skyplane/gateway/gateway_obj_store.py", line 49, in get_obj_store_interface
    self.obj_store_interfaces[key] = ObjectStoreInterface.create(region, bucket)
  File "/pkg/skyplane/obj_store/object_store_interface.py", line 102, in create
    return GCSInterface(bucket)
  File "/pkg/skyplane/obj_store/gcs_interface.py", line 26, in __init__
    self._gcs_client = self.auth.get_storage_client()
  File "/pkg/skyplane/utils/imports.py", line 33, in wrapped
    return fn(*modules_imported, *args, **kwargs)
  File "/pkg/skyplane/compute/gcp/gcp_auth.py", line 200, in get_storage_client
    return storage.Client.from_service_account_json(self.service_account_credentials)
  File "/pkg/skyplane/compute/gcp/gcp_auth.py", line 68, in service_account_credentials
    self._service_account_email = self.create_service_account(self.service_account_name)
  File "/pkg/skyplane/compute/gcp/gcp_auth.py", line 179, in create_service_account
    return retry_backoff(read_modify_write)  # retry loop needed for concurrent policy modifications
  File "/pkg/skyplane/utils/retry.py", line 30, in retry_backoff
    raise e
  File "/pkg/skyplane/utils/retry.py", line 27, in retry_backoff
    return fn()
  File "/pkg/skyplane/compute/gcp/gcp_auth.py", line 158, in read_modify_write
    policy = service.projects().getIamPolicy(resource=self.project_id).execute()
  File "/usr/local/lib/python3.10/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/googleapiclient/http.py", line 938, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://cloudresourcemanager.googleapis.com/v1/projects/note-detection-277711:getIamPolicy?alt=json returned "The caller does not have permission". Details: "The caller does not have permission">

How do I find out who the "caller" is in this case so I can provide the correct permissions?

ephemer commented 1 year ago

I even tried giving the skyplane manual Service Account Owner permissions and it still fails with the above error.

I have now given the "Compute Engine default service account" Owner permissions too and it appears to be working. I am going to leave this ticket open because that doesn't seem like an ideal state of being. Would be good to understand what is really needed here and why skyplane cloud --check-gcp doesn't pick up on the missing permissions

sarahwooders commented 1 year ago

Hi @ephemer - thanks for reporting this issue. It looks like the issue is with listing the existing service accounts in your project. Does the account you authenticate with then you run gcloud auth application-default login have permissions to list service accounts in the GCP project you're using?

Also, would you be able to see if you still have this issue with Skyplane 0.3.1? We fixed a couple authentication issues in #757. 0.3.1 should also let you explicitly set the GCP project you want to use.

sarahwooders commented 1 year ago

@ephemer were you able to resolve this issue?