skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.69k stars 494 forks source link

Authentication fails after deleting an existing sky-key #1036

Closed Michaelvll closed 1 year ago

Michaelvll commented 2 years ago
> sky launch -y -c test-huggingface-9ce1ce58-61 examples/huggingface_glue_imdb_app.yaml
[?25hTraceback (most recent call last):
  File "/Users/zhwu/miniconda3/envs/sky-dev/bin/sky", line 33, in <module>
    sys.exit(load_entry_point('skypilot', 'console_scripts', 'sky')())
  File "/Users/zhwu/miniconda3/envs/sky-dev/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/zhwu/miniconda3/envs/sky-dev/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/utils/common_utils.py", line 108, in _record
    return f(*args, **kwargs)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/cli.py", line 776, in invoke
    return super().invoke(ctx)
  File "/Users/zhwu/miniconda3/envs/sky-dev/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/zhwu/miniconda3/envs/sky-dev/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/zhwu/miniconda3/envs/sky-dev/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/utils/common_utils.py", line 129, in _record
    return f(*args, **kwargs)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/cli.py", line 909, in launch
    _launch_with_confirm(
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/cli.py", line 464, in _launch_with_confirm
    sky.launch(dag,
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/utils/common_utils.py", line 129, in _record
    return f(*args, **kwargs)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/utils/common_utils.py", line 129, in _record
    return f(*args, **kwargs)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/execution.py", line 212, in launch
    _execute(
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/execution.py", line 139, in _execute
    handle = backend.provision(task,
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/utils/common_utils.py", line 129, in _record
    return f(*args, **kwargs)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/utils/common_utils.py", line 108, in _record
    return f(*args, **kwargs)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/backends/backend.py", line 49, in provision
    return self._provision(task, to_provision, dryrun, stream_logs,
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/backends/cloud_vm_ray_backend.py", line 1543, in _provision
    config_dict = provisioner.provision_with_retries(
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/utils/common_utils.py", line 129, in _record
    return f(*args, **kwargs)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/backends/cloud_vm_ray_backend.py", line 1238, in provision_with_retries
    config_dict = self._retry_region_zones(
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/backends/cloud_vm_ray_backend.py", line 886, in _retry_region_zones
    config_dict = backend_utils.write_cluster_config(
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/utils/common_utils.py", line 129, in _record
    return f(*args, **kwargs)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/backends/backend_utils.py", line 645, in write_cluster_config
    _add_auth_to_cluster_config(cloud, yaml_path)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/backends/backend_utils.py", line 687, in _add_auth_to_cluster_config
    config = auth.setup_gcp_authentication(config)
  File "/Users/zhwu/Library/CloudStorage/OneDrive-Personal/AResource/PhD/Research/sky-computing/code/sky-experiment-dev/sky/authentication.py", line 294, in setup_gcp_authentication
    operation = compute.projects().setCommonInstanceMetadata(
  File "/Users/zhwu/miniconda3/envs/sky-dev/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 131, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/Users/zhwu/miniconda3/envs/sky-dev/lib/python3.8/site-packages/googleapiclient/http.py", line 937, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 412 when requesting https://compute.googleapis.com/compute/v1/projects/intercloud-320520/setCommonInstanceMetadata?alt=json returned "Supplied fingerprint does not match current metadata fingerprint.". Details: "[{'message': 'Supplied fingerprint does not match current metadata fingerprint.', 'domain': 'global', 'reason': 'conditionNotMet', 'location': 'If-Match', 'locationType': 'header'}]">
Michaelvll commented 2 years ago

Related to this: ray up creates a new spot controller due to the private key changed in the ray yaml.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 10 days with no activity.