skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.84k stars 518 forks source link

[Optimizer] Check the unsupported features before the optimization #2233

Open Michaelvll opened 1 year ago

Michaelvll commented 1 year ago

Some clouds does not support stopping the cluster. However, it is only checked during the failover, which means the cloud will show up in the optimizer table, which can cause confusion. We should check the unsupported feature before the optimization.

To reproduce: sky launch -i 0 --cpus 2

I 07-13 20:45:56 optimizer.py:732] Considered resources (1 node):
I 07-13 20:45:56 optimizer.py:781] ---------------------------------------------------------------------------------------------------
I 07-13 20:45:56 optimizer.py:781]  CLOUD        INSTANCE          vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE   COST ($)   CHOSEN
I 07-13 20:45:56 optimizer.py:781] ---------------------------------------------------------------------------------------------------
I 07-13 20:45:56 optimizer.py:781]  Kubernetes   2CPU--2GB         2       2         -              kubernetes    0.00          ✔
I 07-13 20:45:56 optimizer.py:781]  AWS          m6i.large         2       8         -              us-east-1     0.10
I 07-13 20:45:56 optimizer.py:781]  Azure        Standard_D2s_v5   2       8         -              eastus        0.10
I 07-13 20:45:56 optimizer.py:781]  GCP          n2-standard-2     2       8         -              us-central1   0.10
I 07-13 20:45:56 optimizer.py:781]  IBM          bx2-8x32          8       32        -              us-east       0.38
I 07-13 20:45:56 optimizer.py:781]  Lambda       gpu_1x_a10        30      200       A10:1          us-east-1     0.60
I 07-13 20:45:56 optimizer.py:781] ---------------------------------------------------------------------------------------------------
I 07-13 20:45:56 optimizer.py:781]
Launching a new cluster 'test-cpu'. Proceed? [Y/n]:

Here the Lambda and Kubernetes do not support stopping the cluster.

Saikrishna-Achalla commented 1 year ago

I'd like to take this up!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 10 days with no activity.

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] commented 3 months ago

This issue was closed because it has been stalled for 10 days with no activity.