SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
The latest released vLLM v0.6.4.post1 has issue. So we specify the vLLM version as v0.6.3.post1 in the serve-qwen-7b.yaml file. So we specify the vLLM to v0.6.3.post1, which works fine.
The current latest released version of vLLM is 0.6.4.post1. Exec serve-qwen-7b.yaml will cuz the following issue due to the vllm version 0.6.4.post1:
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 1778, in _get_module
(qwen-llm2, pid=9356) return importlib.import_module("." + module_name, self.name)
(qwen-llm2, pid=9356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/importlib/init.py", line 90, in import_module
(qwen-llm2, pid=9356) return _bootstrap._gcd_import(name[level:], package, level)
(qwen-llm2, pid=9356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(qwen-llm2, pid=9356) File "", line 1387, in _gcd_import
(qwen-llm2, pid=9356) File "", line 1360, in _find_and_load
(qwen-llm2, pid=9356) File "", line 1331, in _find_and_load_unlocked
(qwen-llm2, pid=9356) File "", line 935, in _load_unlocked
(qwen-llm2, pid=9356) File "", line 995, in exec_module
(qwen-llm2, pid=9356) File "", line 488, in _call_with_frames_removed
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/transformers/processing_utils.py", line 33, in
(qwen-llm2, pid=9356) from .image_utils import ChannelDimension, is_valid_image, is_vision_available
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/transformers/image_utils.py", line 58, in
(qwen-llm2, pid=9356) from torchvision.transforms import InterpolationMode
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/torchvision/init.py", line 10, in
(qwen-llm2, pid=9356) from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils # usort:skip
(qwen-llm2, pid=9356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/torchvision/_meta_registrations.py", line 163, in
(qwen-llm2, pid=9356) @torch.library.register_fake("torchvision::nms")
(qwen-llm2, pid=9356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/torch/library.py", line 654, in register
(qwen-llm2, pid=9356) use_lib._register_fake(op_name, func, _stacklevel=stacklevel + 1)
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/torch/library.py", line 154, in _register_fake
(qwen-llm2, pid=9356) handle = entry.abstract_impl.register(func_to_register, source)
(qwen-llm2, pid=9356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/torch/_library/abstract_impl.py", line 31, in register
(qwen-llm2, pid=9356) if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
(qwen-llm2, pid=9356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(qwen-llm2, pid=9356) RuntimeError: operator torchvision::nms does not exist
(qwen-llm2, pid=9356)
(qwen-llm2, pid=9356) The above exception was the direct cause of the following exception:
(qwen-llm2, pid=9356)
(qwen-llm2, pid=9356) Traceback (most recent call last):
(qwen-llm2, pid=9356) File "", line 189, in _run_module_as_main
(qwen-llm2, pid=9356) File "", line 112, in _get_module_details
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/vllm/init.py", line 3, in
sky serve up examples/oci/serve-qwen-7b.yaml -n qwen-llm
Tested (run the relevant ones):
[ ] Code formatting: bash format.sh
[ ] Any manual or new tests for this PR (please specify below)
The latest released vLLM v0.6.4.post1 has issue. So we specify the vLLM version as v0.6.3.post1 in the serve-qwen-7b.yaml file. So we specify the vLLM to v0.6.3.post1, which works fine.
The current latest released version of vLLM is 0.6.4.post1. Exec serve-qwen-7b.yaml will cuz the following issue due to the vllm version 0.6.4.post1:
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 1778, in _get_module (qwen-llm2, pid=9356) return importlib.import_module("." + module_name, self.name) (qwen-llm2, pid=9356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/importlib/init.py", line 90, in import_module (qwen-llm2, pid=9356) return _bootstrap._gcd_import(name[level:], package, level) (qwen-llm2, pid=9356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (qwen-llm2, pid=9356) File "", line 1387, in _gcd_import
(qwen-llm2, pid=9356) File "", line 1360, in _find_and_load
(qwen-llm2, pid=9356) File "", line 1331, in _find_and_load_unlocked
(qwen-llm2, pid=9356) File "", line 935, in _load_unlocked
(qwen-llm2, pid=9356) File "", line 995, in exec_module
(qwen-llm2, pid=9356) File "", line 488, in _call_with_frames_removed
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/transformers/processing_utils.py", line 33, in
(qwen-llm2, pid=9356) from .image_utils import ChannelDimension, is_valid_image, is_vision_available
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/transformers/image_utils.py", line 58, in
(qwen-llm2, pid=9356) from torchvision.transforms import InterpolationMode
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/torchvision/init.py", line 10, in
(qwen-llm2, pid=9356) from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils # usort:skip
(qwen-llm2, pid=9356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/torchvision/_meta_registrations.py", line 163, in
(qwen-llm2, pid=9356) @torch.library.register_fake("torchvision::nms")
(qwen-llm2, pid=9356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/torch/library.py", line 654, in register
(qwen-llm2, pid=9356) use_lib._register_fake(op_name, func, _stacklevel=stacklevel + 1)
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/torch/library.py", line 154, in _register_fake
(qwen-llm2, pid=9356) handle = entry.abstract_impl.register(func_to_register, source)
(qwen-llm2, pid=9356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/torch/_library/abstract_impl.py", line 31, in register
(qwen-llm2, pid=9356) if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
(qwen-llm2, pid=9356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(qwen-llm2, pid=9356) RuntimeError: operator torchvision::nms does not exist
(qwen-llm2, pid=9356)
(qwen-llm2, pid=9356) The above exception was the direct cause of the following exception:
(qwen-llm2, pid=9356)
(qwen-llm2, pid=9356) Traceback (most recent call last):
(qwen-llm2, pid=9356) File "", line 189, in _run_module_as_main
(qwen-llm2, pid=9356) File "", line 112, in _get_module_details
(qwen-llm2, pid=9356) File "/home/ubuntu/miniforge3/envs/vllm/lib/python3.12/site-packages/vllm/init.py", line 3, in
sky serve up examples/oci/serve-qwen-7b.yaml -n qwen-llm
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh