skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.81k stars 512 forks source link

sky local up offline #3902

Open ZJU-lishuang opened 2 months ago

ZJU-lishuang commented 2 months ago

Version & Commit info:

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

sudo nvidia-ctk runtime configure --runtime=docker --set-as-default sudo systemctl restart docker

sudo sed -i '/accept-nvidia-visible-devices-as-volume-mounts/c\accept-nvidia-visible-devices-as-volume-mounts = true' /etc/nvidia-container-runtime/config.toml

wget https://get.helm.sh/helm-v3.15.4-linux-amd64.tar.gz tar -zxvf helm-v3.15.4-linux-amd64.tar.gz sudo mv linux-amd64/helm /usr/local/bin/helm helm help

export SKYPILOT_DISABLE_USAGE_COLLECTION=1 export SKYPILOT_DEBUG=1

sky local up


This is the error.

E 09-01 20:43:25 common.py:213] Failed to fetch Kubernetes catalog kubernetes/images.csv. Please check your internet connection. Clusters No existing clusters.

Traceback (most recent call last): File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/urllib3/connection.py", line 196, in _new_conn sock = connection.create_connection( File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection raise err File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen response = self._make_request( File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/urllib3/connectionpool.py", line 490, in _make_request raise new_e File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request self._validate_conn(conn) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn conn.connect() File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/urllib3/connection.py", line 615, in connect self.sock = sock = self._new_conn() File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/urllib3/connection.py", line 211, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fc064fe39d0>: Failed to establish a new connection: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/requests/adapters.py", line 667, in send resp = conn.urlopen( File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen retries = retries.increment( File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/urllib3/util/retry.py", line 519, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /skypilot-org/skypilot-catalog/master/catalogs/v5/kubernetes/images.csv (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fc064fe39d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/ls/miniconda3/envs/sky/bin/sky", line 8, in sys.exit(cli()) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/utils/common_utils.py", line 367, in _record return f(args, kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/cli.py", line 806, in invoke return super().invoke(ctx) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/utils/common_utils.py", line 388, in _record return f(*args, kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/cli.py", line 1119, in launch _launch_with_confirm(task, File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/cli.py", line 597, in _launch_with_confirm sky.launch( File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/utils/common_utils.py", line 388, in _record return f(*args, *kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/utils/common_utils.py", line 388, in _record return f(args, kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/execution.py", line 458, in launch return _execute( File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/execution.py", line 273, in _execute handle = backend.provision(task, File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/utils/common_utils.py", line 388, in _record return f(*args, kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/utils/common_utils.py", line 367, in _record return f(*args, *kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/backends/backend.py", line 57, in provision return self._provision(task, to_provision, dryrun, stream_logs, File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/backends/cloud_vm_ray_backend.py", line 2759, in _provision config_dict = retry_provisioner.provision_with_retries( File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/utils/common_utils.py", line 388, in _record return f(args, kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/backends/cloud_vm_ray_backend.py", line 1952, in provision_with_retries config_dict = self._retry_zones( File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/backends/cloud_vm_ray_backend.py", line 1387, in _retry_zones config_dict = backend_utils.write_cluster_config( File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/utils/common_utils.py", line 388, in _record return f(*args, kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/backends/backend_utils.py", line 799, in write_cluster_config resources_vars = to_provision.make_deploy_variables( File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/resources.py", line 1043, in make_deploy_variables cloud_specific_variables = self.cloud.make_deploy_resources_variables( File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/clouds/kubernetes.py", line 264, in make_deploy_resources_variables ssh_jump_image = service_catalog.get_image_id_from_tag( File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/clouds/service_catalog/init.py", line 351, in get_image_id_from_tag return _map_clouds_catalog(clouds, 'get_image_id_from_tag', tag, region) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/clouds/service_catalog/init.py", line 37, in _map_clouds_catalog cloud_module = importlib.import_module( File "/home/ls/miniconda3/envs/sky/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/clouds/service_catalog/kubernetes_catalog.py", line 25, in _image_df = common.read_catalog('kubernetes/images.csv', File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/clouds/service_catalog/common.py", line 217, in read_catalog raise e File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/sky/clouds/service_catalog/common.py", line 202, in read_catalog r = requests.get(url) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/requests/api.py", line 73, in get return request("get", url, params=params, kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, send_kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/home/ls/miniconda3/envs/sky/lib/python3.10/site-packages/requests/adapters.py", line 700, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /skypilot-org/skypilot-catalog/master/catalogs/v5/kubernetes/images.csv (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fc064fe39d0>: Failed to establish a new connection: [Errno 111] Connection refused'))



I see the code [here](https://github.com/skypilot-org/skypilot/blob/50f68d2093cbe9dc7da6d53c3c17c45e0b97b84c/sky/clouds/service_catalog/constants.py).
How to set `HOSTED_CATALOG_DIR_URL` to sky local up offline.
romilbhardwaj commented 2 months ago

Hi @ZJU-lishuang, looks like you may be behind a firewall blocking connection to our catalog. Try manually downloading the catalogs directory and placing it at ~/.sky/catalogs?

ZJU-lishuang commented 2 months ago

I will try it.