skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.81k stars 513 forks source link

[k8s] support to use custom gpu resource name if it's not nvidia.com/gpu #4337

Open nkwangleiGIT opened 1 week ago

nkwangleiGIT commented 1 week ago

We're using customized gpu resource name when using the device plugin, so we have the following GPUs in the node capability:

nvidia.com/gpu-h100
nvidia.com/gpu-h20
nvidia.com/gpu-4090

So we have to use the resource names above when launch to local K8S, such as

sky launch --image-id skypilot:20240613 --cpus 8 --memory 32 --gpus gpu-3090:2 -c my-sky-cluster --cloud kubernetes

So this PR will support to use CUSTOM_GPU_RESOURCE_NAME from environment variable to overwrite the default nvidia.com/gpu in the resources.

Tested (run the relevant ones):

romilbhardwaj commented 1 week ago

Hi @nkwangleiGIT - this PR was close to being merged. Would you like to reopen it?