skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.42k stars 455 forks source link

[cudo] Unable to setup credentials on cudo #3385

Closed romilbhardwaj closed 1 week ago

romilbhardwaj commented 5 months ago

Running sky check after following installation instructions fails with KeyError:

(base) root@f81a9cc47641:/sky_repo/skypilot# sky check
Checking credentials to enable clouds for SkyPilot.
...

  Checking Cudo...Traceback (most recent call last):
  File "/opt/conda/bin/sky", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/sky_repo/skypilot/sky/utils/common_utils.py", line 349, in _record
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/sky_repo/skypilot/sky/cli.py", line 803, in invoke
    return super().invoke(ctx)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sky_repo/skypilot/sky/utils/common_utils.py", line 370, in _record
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/sky_repo/skypilot/sky/cli.py", line 2812, in check
    sky_check.check(verbose=verbose)
  File "/sky_repo/skypilot/sky/check.py", line 48, in check
    check_one_cloud(cloud_tuple)
  File "/sky_repo/skypilot/sky/check.py", line 25, in check_one_cloud
    ok, reason = cloud.check_credentials()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sky_repo/skypilot/sky/clouds/cudo.py", line 293, in check_credentials
    project_id, error = cudo_api.get_project_id()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/cudo_compute/cudo_api.py", line 34, in get_project_id
    return context_config['project'], None
           ~~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 'project'

Looks like the credentials file generated by cudoctl has changed formats?

Full logs from running on berkeleyskypilot/skypilot-debug here: https://gist.github.com/romilbhardwaj/7324c222196248c06f92962201140cb1

Versions:

(base) root@f81a9cc47641:/sky_repo/skypilot# cudoctl --version
cudoctl version 0.3.2, commit d495ef5b4e9f8ff37738adc08765567702cec2a5, built at 2023-11-04T19:43:01Z
(base) root@f81a9cc47641:/sky_repo/skypilot# pip freeze | grep cudo
cudo-compute==0.1.8

Here's the config generated by cudoctl:

(base) ➜  ~ cat ~/.config/cudo/cudo.yml
keys:
    - key: <redacted>
      name: sky
configVersion: v0
contexts:
    - name: sky
      key: sky
      data-center: br-saopaulo-1
current-context: sky

cc @JungleCatSW

JungleCatSW commented 4 months ago

Hi thanks I will take a look

JungleCatSW commented 4 months ago

The issue was that if a project hadn't been created in the account or if the command line tool was set up before project creation there wouldn't be a default project in the credentials file. I have added some checks with better error messages and better instructions to handle it: https://github.com/skypilot-org/skypilot/pull/3438

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

romilbhardwaj commented 1 week ago

This is closed with #3438.