Open nkwangleiGIT opened 1 week ago
We're using customized gpu resource name when using the device plugin, so we have the following GPUs in the node capability:
nvidia.com/gpu-h100 nvidia.com/gpu-h20 nvidia.com/gpu-4090
So we have to use the resource names above when launch to local K8S, such as
sky launch --image-id skypilot:20240613 --cpus 8 --memory 32 --gpus gpu-3090:2 -c my-sky-cluster --cloud kubernetes
So this PR will support to use CUSTOM_GPU_RESOURCE_NAME from environment variable to overwrite the default nvidia.com/gpu in the resources.
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh
Hi @nkwangleiGIT - this PR was close to being merged. Would you like to reopen it?
We're using customized gpu resource name when using the device plugin, so we have the following GPUs in the node capability:
So we have to use the resource names above when launch to local K8S, such as
So this PR will support to use CUSTOM_GPU_RESOURCE_NAME from environment variable to overwrite the default nvidia.com/gpu in the resources.
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh