skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.54k stars 466 forks source link

[UX] Better logging for unsupported features #3911

Open Michaelvll opened 1 week ago

Michaelvll commented 1 week ago

When a cloud does not support a specific feature, we currently still show ResourceUnavailableError at the end, which is reported to be confusing by our users.

romilbhardwaj commented 6 days ago

Minimal example:

$ sky local up
...
# Try with autostop on k8s
$ sky launch -c k8s --cloud kubernetes -i 10
== Optimizer ==
Estimated cost: $0.0 / hour

Considered resources (1 node):
---------------------------------------------------------------------------------------------
 CLOUD        INSTANCE    vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE   COST ($)   CHOSEN
---------------------------------------------------------------------------------------------
 Kubernetes   2CPU--2GB   2       2         -              kubernetes    0.00          ✔
---------------------------------------------------------------------------------------------

Launching a new cluster 'k8s'. Proceed? [Y/n]:
Creating a new cluster: 'k8s' [1x Kubernetes(2CPU--2GB)].
Tip: to reuse an existing cluster, specify --cluster (-c). Run `sky status` to see existing clusters.
sky.exceptions.NotSupportedError: The following features are not supported by Kubernetes:
    Feature  Reason
    stop     Kubernetes does not support stopping VMs.

Provision failed for 1x Kubernetes(2CPU--2GB) in kubernetes. Trying other locations (if any).

sky.exceptions.ResourcesUnavailableError: Failed to provision all possible launchable resources. Relax the task's resource requirements: 1x Kubernetes()
To keep retrying until the cluster is up, use the `--retry-until-up` flag.

We should ideally suppress the ResourcesUnavailableError() here.