skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.82k stars 513 forks source link

runpod 4090 spot not available #4265

Open alita-moore opened 2 weeks ago

alita-moore commented 2 weeks ago

Runpod offers 4090 spot on their website, but when I try to reserve it using skypilot it gives me an error:

sky launch -c mycluster hello-sky.yaml --use-spot
Task from YAML spec: hello-sky.yaml
No resource satisfying RunPod([Spot], {'RTX4090': 1}) on RunPod.
sky.exceptions.ResourcesUnavailableError: Catalog does not contain any instances satisfying the request: 1x RunPod([Spot], {'RTX4090': 1}).
To fix: relax or change the resource requirements.

Hint: sky show-gpus to list available accelerators.
      sky check to check the enabled clouds.

Version & Commit info:

Michaelvll commented 2 weeks ago

Spot instances are not supported by RunPod yet, as they don't have spot instance support in their APIs yet. Related to #3927

alita-moore commented 2 weeks ago

oh cool, will this be added soon?

kldzj commented 1 week ago

@Michaelvll they have support for creating spot instances, see https://docs.runpod.io/sdks/graphql/manage-pods#create-spot-pod and https://graphql-spec.runpod.io/#mutation-podRentInterruptable

Michaelvll commented 1 week ago

@Michaelvll they have support for creating spot instances, see https://docs.runpod.io/sdks/graphql/manage-pods#create-spot-pod and https://graphql-spec.runpod.io/#mutation-podRentInterruptable

@kldzj, thanks for the pointer! They have the spot instance supported in the low-level GraphQL, but not supported with python API we are using yet: https://github.com/runpod/runpod-python/issues/327

See: https://github.com/skypilot-org/skypilot/blob/294401455fca036829a342f26e1718e243bad300/sky/provision/runpod/utils.py#L145-L163

One option is to change the API we use to GraphQL with HTTP request directly. Would you like to have a try to add it? Any contribution would be super helpful. : )