Open andylizf opened 1 week ago
When running sky serve up examples/serve/http_server/task.yaml -n new-http --cloud gcp, the command keeps retrying due to a GCP permission error:
sky serve up examples/serve/http_server/task.yaml -n new-http --cloud gcp
Required 'compute.images.useReadOnly' permission for 'projects/sky-dev-465/global/images/skypilot-gcp-cpu-ubuntu-20241017184242'
The command retries indefinitely. This is likely a temporary issue with GCP permissions.
D 11-10 21:02:03 provisioner.py:135] SkyPilot version: 1.0.0-dev0; commit: 1f25cd36cd76e7f3380f2cb80d0c33a1cf632f94 D 11-10 21:02:03 provisioner.py:137] D 11-10 21:02:03 provisioner.py:137] D 11-10 21:02:03 provisioner.py:137] ==================== Provisioning ==================== D 11-10 21:02:03 provisioner.py:137] D 11-10 21:02:03 provisioner.py:138] Provision config: D 11-10 21:02:03 provisioner.py:138] { D 11-10 21:02:03 provisioner.py:138] "provider_config": { D 11-10 21:02:03 provisioner.py:138] "type": "external", D 11-10 21:02:03 provisioner.py:138] "module": "sky.provision.gcp", D 11-10 21:02:03 provisioner.py:138] "region": "us-central1", D 11-10 21:02:03 provisioner.py:138] "availability_zone": "us-central1-a", D 11-10 21:02:03 provisioner.py:138] "cache_stopped_nodes": true, D 11-10 21:02:03 provisioner.py:138] "project_id": "psychic-order-437203-r7", D 11-10 21:02:03 provisioner.py:138] "firewall_rule": "sky-ports-sky-serve-controller-6eabc0cb-6eab", D 11-10 21:02:03 provisioner.py:138] "use_internal_ips": false, D 11-10 21:02:03 provisioner.py:138] "force_enable_external_ips": false, D 11-10 21:02:03 provisioner.py:138] "disable_launch_config_check": true, D 11-10 21:02:03 provisioner.py:138] "use_managed_instance_group": false D 11-10 21:02:03 provisioner.py:138] }, D 11-10 21:02:03 provisioner.py:138] "authentication_config": { D 11-10 21:02:03 provisioner.py:138] "ssh_user": "gcpuser", D 11-10 21:02:03 provisioner.py:138] "ssh_private_key": "~/.ssh/sky-key" D 11-10 21:02:03 provisioner.py:138] }, D 11-10 21:02:03 provisioner.py:138] "docker_config": {}, D 11-10 21:02:03 provisioner.py:138] "node_config": { D 11-10 21:02:03 provisioner.py:138] "labels": { D 11-10 21:02:03 provisioner.py:138] "skypilot-user": "andyl", D 11-10 21:02:03 provisioner.py:138] "use-managed-instance-group": "0" D 11-10 21:02:03 provisioner.py:138] }, D 11-10 21:02:03 provisioner.py:138] "machineType": "n2-standard-4", D 11-10 21:02:03 provisioner.py:138] "disks": [ D 11-10 21:02:03 provisioner.py:138] { D 11-10 21:02:03 provisioner.py:138] "boot": true, D 11-10 21:02:03 provisioner.py:138] "autoDelete": true, D 11-10 21:02:03 provisioner.py:138] "type": "PERSISTENT", D 11-10 21:02:03 provisioner.py:138] "initializeParams": { D 11-10 21:02:03 provisioner.py:138] "diskSizeGb": 200, D 11-10 21:02:03 provisioner.py:138] "sourceImage": "projects/sky-dev-465/global/images/skypilot-gcp-cpu-ubuntu-20241017184242", D 11-10 21:02:03 provisioner.py:138] "diskType": "zones/us-central1-a/diskTypes/pd-balanced" D 11-10 21:02:03 provisioner.py:138] } D 11-10 21:02:03 provisioner.py:138] } D 11-10 21:02:03 provisioner.py:138] ], D 11-10 21:02:03 provisioner.py:138] "metadata": { D 11-10 21:02:03 provisioner.py:138] "items": [ D 11-10 21:02:03 provisioner.py:138] { D 11-10 21:02:03 provisioner.py:138] "key": "ssh-keys", D 11-10 21:02:03 provisioner.py:138] "value": "gcpuser:ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC6nCRL1m/qbnjCm/9uF91bsQXtyMmDB/JCvHBs19bKJfTa5N/lW+SetSqKox+63QIuH2hfK9x7cs5a4BWLDmGFfXg/PobmcY31jv6hlM8oaXwulJqQnW7oww0SdjlFrJ5XjMtm2eFAZ5r85NGPgEI8PcvwzUqGkPqhsrYYY7hMG5A/WfSFMSkZGoRMjkxo+mpHSV08SyzI/xO7kTYuA7GUs9VbrErODptxiSWiisD39MUUiAKtU7kCVRKw4iE8KnWb0vwiZN4Skkg9yDMf9sr8iAQmR2y9RvyY3JtxmgGosTMWGZ0E5oosyLEUbHXsa++u2alAhKDqfn3jXAaCfEUd" D 11-10 21:02:03 provisioner.py:138] } D 11-10 21:02:03 provisioner.py:138] ] D 11-10 21:02:03 provisioner.py:138] } D 11-10 21:02:03 provisioner.py:138] }, D 11-10 21:02:03 provisioner.py:138] "count": 1, D 11-10 21:02:03 provisioner.py:138] "tags": {}, D 11-10 21:02:03 provisioner.py:138] "resume_stopped_nodes": true, D 11-10 21:02:03 provisioner.py:138] "ports_to_open_on_launch": null D 11-10 21:02:03 provisioner.py:138] } D 11-10 21:02:03 config.py:117] gcp_credentials not found in cluster yaml file. Falling back to GOOGLE_APPLICATION_CREDENTIALS environment variable. I 11-10 21:02:06 config.py:217] _configure_iam_role: Checking permissions for skypilot-v1@psychic-order-437203-r7.iam.gserviceaccount.com... I 11-10 21:02:07 config.py:613] get_usable_vpc: Found a usable VPC network 'default'. I 11-10 21:02:09 instance.py:212] [] D 11-10 21:02:09 instance_utils.py:802] Launching GCP instances with "bulkInsert" ... D 11-10 21:02:10 instance_utils.py:851] create_instances: googleapiclient.errors.HttpError: W 11-10 21:02:10 instance_utils.py:112] Got return code 'forbidden' in us-central1-a: "Required 'compute.images.useReadOnly' permission for 'projects/sky-dev-465/global/images/skypilot-gcp-cpu-ubuntu-20241017184242'" D 11-10 21:02:10 provisioner.py:150] Failed to provision 'sky-serve-controller-6eabc0cb' on GCP (us-central1-a). D 11-10 21:02:10 provisioner.py:152] bulk_provision for 'sky-serve-controller-6eabc0cb' failed. Stacktrace: D 11-10 21:02:10 provisioner.py:152] Traceback (most recent call last): D 11-10 21:02:10 provisioner.py:152] File "/home/andyl/skypilot/sky/provision/provisioner.py", line 141, in bulk_provision D 11-10 21:02:10 provisioner.py:152] return _bulk_provision(cloud, region, cluster_name, D 11-10 21:02:10 provisioner.py:152] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ D 11-10 21:02:10 provisioner.py:152] File "/home/andyl/skypilot/sky/provision/provisioner.py", line 63, in _bulk_provision D 11-10 21:02:10 provisioner.py:152] provision_record = provision.run_instances(provider_name, D 11-10 21:02:10 provisioner.py:152] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ D 11-10 21:02:10 provisioner.py:152] File "/home/andyl/skypilot/sky/provision/__init__.py", line 50, in _wrapper D 11-10 21:02:10 provisioner.py:152] return impl(*args, **kwargs) D 11-10 21:02:10 provisioner.py:152] ^^^^^^^^^^^^^^^^^^^^^ D 11-10 21:02:10 provisioner.py:152] File "/home/andyl/skypilot/sky/provision/gcp/instance.py", line 360, in run_instances D 11-10 21:02:10 provisioner.py:152] return _run_instances(region, cluster_name_on_cloud, config) D 11-10 21:02:10 provisioner.py:152] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ D 11-10 21:02:10 provisioner.py:152] File "/home/andyl/skypilot/sky/provision/gcp/instance.py", line 301, in _run_instances D 11-10 21:02:10 provisioner.py:152] raise error D 11-10 21:02:10 provisioner.py:152] sky.provision.common.ProvisionerError: Failed to launch instances. D 11-10 21:02:10 provisioner.py:152] D 11-10 21:02:10 provisioner.py:157] Stopping the failed cluster. D 11-10 21:02:10 instance.py:36] handlers: [] D 11-10 21:02:11 instance.py:47] handler_to_instances: defaultdict(, {}) D 11-10 21:02:11 instance.py:36] handlers: dict_keys([]) D 11-10 21:02:11 instance.py:47] handler_to_instances: defaultdict(, {}) D 11-10 21:02:51 provisioner.py:135] SkyPilot version: 1.0.0-dev0; commit: 1f25cd36cd76e7f3380f2cb80d0c33a1cf632f94 D 11-10 21:02:51 provisioner.py:137] D 11-10 21:02:51 provisioner.py:137]
When running
sky serve up examples/serve/http_server/task.yaml -n new-http --cloud gcp
, the command keeps retrying due to a GCP permission error:The command retries indefinitely. This is likely a temporary issue with GCP permissions.
Partial Logs