Closed WesleyYue closed 4 months ago
Full logs:
Task from YAML spec: x.yaml
W 06-07 16:53:05 aws_catalog.py:173] Failed to fetch availability zone mapping. ImportError: Failed to import dependencies for AWS. Try pip install "skypilot[aws]"
W 06-07 16:53:06 aws_catalog.py:173] Failed to fetch availability zone mapping. ImportError: Failed to import dependencies for AWS. Try pip install "skypilot[aws]"
I 06-07 16:53:06 cli.py:1112] Service section will be ignored when using `sky launch`.
I 06-07 16:53:06 cli.py:1112] To spin up a service, use SkyServe CLI: sky serve up
I 06-07 16:53:06 optimizer.py:1264] No resource satisfying Azure({'A100': 1}, disk_tier=best, ports=['8000'], region=westus3) on Azure.
I 06-07 16:53:06 optimizer.py:1268] Did you mean: ['A100-80GB:1', 'A100-80GB:2', 'A100-80GB:4']
I 06-07 16:53:06 optimizer.py:1264] No resource satisfying Azure({'L4': 1}, disk_tier=best, ports=['8000'], region=westus3) on Azure.
I 06-07 16:53:06 optimizer.py:1268] Did you mean: ['A100-80GB:1', 'A100-80GB:2', 'A100-80GB:4']
I 06-07 16:53:06 optimizer.py:1264] No resource satisfying Azure({'A10G': 1}, disk_tier=best, ports=['8000'], region=westus3) on Azure.
I 06-07 16:53:06 optimizer.py:1268] Did you mean: ['A100-80GB:1', 'A100-80GB:2', 'A100-80GB:4']
I 06-07 16:53:06 optimizer.py:1264] No resource satisfying Azure({'A40': 1}, disk_tier=best, ports=['8000'], region=westus3) on Azure.
I 06-07 16:53:06 optimizer.py:1268] Did you mean: ['A100-80GB:1', 'A100-80GB:2', 'A100-80GB:4']
I 06-07 16:53:06 optimizer.py:1264] No resource satisfying Azure({'L40': 1}, disk_tier=best, ports=['8000'], region=westus3) on Azure.
I 06-07 16:53:06 optimizer.py:1268] Did you mean: ['A100-80GB:1', 'A100-80GB:2', 'A100-80GB:4']
I 06-07 16:53:06 optimizer.py:695] == Optimizer ==
I 06-07 16:53:06 optimizer.py:706] Target: minimizing cost
I 06-07 16:53:06 optimizer.py:718] Estimated cost: $0.5 / hour
I 06-07 16:53:06 optimizer.py:718]
I 06-07 16:53:06 optimizer.py:843] Considered resources (1 node):
I 06-07 16:53:06 optimizer.py:913] -------------------------------------------------------------------------------------------------------
I 06-07 16:53:06 optimizer.py:913] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 06-07 16:53:06 optimizer.py:913] -------------------------------------------------------------------------------------------------------
I 06-07 16:53:06 optimizer.py:913] Azure Standard_NV6ads_A10_v5 6 55 A10:1 westus3 0.45 ✔
I 06-07 16:53:06 optimizer.py:913] Azure Standard_NC24ads_A100_v4 24 220 A100-80GB:1 westus3 3.67
I 06-07 16:53:06 optimizer.py:913] -------------------------------------------------------------------------------------------------------
I 06-07 16:53:06 optimizer.py:913]
I 06-07 16:53:06 optimizer.py:931] Multiple Azure instances satisfy A10:1. The cheapest Azure(Standard_NV6ads_A10_v5, {'A10': 1}, disk_tier=best, ports=['8000']) is considered among:
I 06-07 16:53:06 optimizer.py:931] ['Standard_NV6ads_A10_v5', 'Standard_NV12ads_A10_v5', 'Standard_NV18ads_A10_v5', 'Standard_NV36ads_A10_v5', 'Standard_NV36adms_A10_v5'].
I 06-07 16:53:06 optimizer.py:931]
I 06-07 16:53:06 optimizer.py:937] To list more details, run 'sky show-gpus A10'.
Launching a new cluster 'qwen'. Proceed? [Y/n]: y
I 06-07 16:53:10 cloud_vm_ray_backend.py:4397] Creating a new cluster: 'qwen' [1x Azure(Standard_NV6ads_A10_v5, {'A10': 1}, disk_tier=best, ports=['8000'])].
I 06-07 16:53:10 cloud_vm_ray_backend.py:4397] Tip: to reuse an existing cluster, specify --cluster (-c). Run `sky status` to see existing clusters.
I 06-07 16:53:13 cloud_vm_ray_backend.py:1385] To view detailed progress: tail -n100 -f /Users/wesley/sky_logs/sky-2024-06-07-16-53-06-502155/provision.log
I 06-07 16:53:13 cloud_vm_ray_backend.py:1779] Launching on Azure westus3
I 06-07 16:57:13 log_utils.py:45] Head node is up.
I 06-07 17:06:59 cloud_vm_ray_backend.py:1627] Successfully provisioned or found existing VM.
I 06-07 17:07:03 cloud_vm_ray_backend.py:3215] Running setup on 1 node.
[skypilot.yaml] Activating conda environment 'qwen'
EnvironmentNameNotFound: Could not find conda environment: qwen
You can list all discoverable environments with `conda info --envs`.
[skypilot.yaml] Creating new conda environment 'qwen' with Python 3.10
Channels:
- defaults
Platform: linux-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
## Package Plan ##
environment location: /home/azureuser/miniconda3/envs/qwen
added / updated specs:
- python=3.10
The following packages will be downloaded:
package | build
---------------------------|-----------------
bzip2-1.0.8 | h5eee18b_6 262 KB
ca-certificates-2024.3.11 | h06a4308_0 127 KB
libffi-3.4.4 | h6a678d5_1 141 KB
openssl-3.0.13 | h7f8727e_2 5.2 MB
pip-24.0 | py310h06a4308_0 2.7 MB
python-3.10.14 | h955ad1f_1 26.8 MB
setuptools-69.5.1 | py310h06a4308_0 1012 KB
sqlite-3.45.3 | h5eee18b_0 1.2 MB
tk-8.6.14 | h39e8969_0 3.4 MB
tzdata-2024a | h04d1e81_0 116 KB
wheel-0.43.0 | py310h06a4308_0 110 KB
xz-5.4.6 | h5eee18b_1 643 KB
zlib-1.2.13 | h5eee18b_1 111 KB
------------------------------------------------------------
Total: 41.8 MB
The following NEW packages will be INSTALLED:
_libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main
_openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu
bzip2 pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6
ca-certificates pkgs/main/linux-64::ca-certificates-2024.3.11-h06a4308_0
ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.38-h1181459_1
libffi pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1
libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1
libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1
libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1
libuuid pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0
ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0
openssl pkgs/main/linux-64::openssl-3.0.13-h7f8727e_2
pip pkgs/main/linux-64::pip-24.0-py310h06a4308_0
python pkgs/main/linux-64::python-3.10.14-h955ad1f_1
readline pkgs/main/linux-64::readline-8.2-h5eee18b_0
setuptools pkgs/main/linux-64::setuptools-69.5.1-py310h06a4308_0
sqlite pkgs/main/linux-64::sqlite-3.45.3-h5eee18b_0
tk pkgs/main/linux-64::tk-8.6.14-h39e8969_0
tzdata pkgs/main/noarch::tzdata-2024a-h04d1e81_0
wheel pkgs/main/linux-64::wheel-0.43.0-py310h06a4308_0
xz pkgs/main/linux-64::xz-5.4.6-h5eee18b_1
zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1
Downloading and Extracting Packages: ...working... done
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
#
# To activate this environment, use
#
# $ conda activate qwen
#
# To deactivate an active environment, use
#
# $ conda deactivate
[skypilot.yaml] Installing required packages...
Collecting vllm==0.3.2
Downloading vllm-0.3.2-cp310-cp310-manylinux1_x86_64.whl.metadata (7.5 kB)
Collecting ninja (from vllm==0.3.2)
Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
Collecting psutil (from vllm==0.3.2)
Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (21 kB)
Collecting ray>=2.9 (from vllm==0.3.2)
Downloading ray-2.24.0-cp310-cp310-manylinux2014_x86_64.whl.metadata (13 kB)
Collecting sentencepiece (from vllm==0.3.2)
Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting numpy (from vllm==0.3.2)
Downloading numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.0/61.0 kB 3.3 MB/s eta 0:00:00
Collecting torch==2.1.2 (from vllm==0.3.2)
Downloading torch-2.1.2-cp310-cp310-manylinux1_x86_64.whl.metadata (25 kB)
Collecting transformers>=4.38.0 (from vllm==0.3.2)
Downloading transformers-4.41.2-py3-none-any.whl.metadata (43 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.8/43.8 kB 3.8 MB/s eta 0:00:00
Collecting xformers==0.0.23.post1 (from vllm==0.3.2)
Downloading xformers-0.0.23.post1-cp310-cp310-manylinux2014_x86_64.whl.metadata (1.0 kB)
Collecting fastapi (from vllm==0.3.2)
Downloading fastapi-0.111.0-py3-none-any.whl.metadata (25 kB)
Collecting uvicorn[standard] (from vllm==0.3.2)
Downloading uvicorn-0.30.1-py3-none-any.whl.metadata (6.3 kB)
Collecting pydantic>=2.0 (from vllm==0.3.2)
Downloading pydantic-2.7.3-py3-none-any.whl.metadata (108 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.0/109.0 kB 5.9 MB/s eta 0:00:00
Collecting aioprometheus[starlette] (from vllm==0.3.2)
Downloading aioprometheus-23.12.0-py3-none-any.whl.metadata (9.8 kB)
Collecting pynvml==11.5.0 (from vllm==0.3.2)
Downloading pynvml-11.5.0-py3-none-any.whl.metadata (7.8 kB)
Collecting triton>=2.1.0 (from vllm==0.3.2)
Downloading triton-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Collecting cupy-cuda12x==12.1.0 (from vllm==0.3.2)
Downloading cupy_cuda12x-12.1.0-cp310-cp310-manylinux2014_x86_64.whl.metadata (2.6 kB)
Collecting fastrlock>=0.5 (from cupy-cuda12x==12.1.0->vllm==0.3.2)
Downloading fastrlock-0.8.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.metadata (9.3 kB)
Collecting filelock (from torch==2.1.2->vllm==0.3.2)
Downloading filelock-3.14.0-py3-none-any.whl.metadata (2.8 kB)
Collecting typing-extensions (from torch==2.1.2->vllm==0.3.2)
Downloading typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting sympy (from torch==2.1.2->vllm==0.3.2)
Downloading sympy-1.12.1-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch==2.1.2->vllm==0.3.2)
Downloading networkx-3.3-py3-none-any.whl.metadata (5.1 kB)
Collecting jinja2 (from torch==2.1.2->vllm==0.3.2)
Downloading jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting fsspec (from torch==2.1.2->vllm==0.3.2)
Downloading fsspec-2024.6.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.1.2->vllm==0.3.2)
Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.1.2->vllm==0.3.2)
Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.1.2->vllm==0.3.2)
Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.1.2->vllm==0.3.2)
Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.1.2->vllm==0.3.2)
Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.1.2->vllm==0.3.2)
Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.2.106 (from torch==2.1.2->vllm==0.3.2)
Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch==2.1.2->vllm==0.3.2)
Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch==2.1.2->vllm==0.3.2)
Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-nccl-cu12==2.18.1 (from torch==2.1.2->vllm==0.3.2)
Downloading nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-nvtx-cu12==12.1.105 (from torch==2.1.2->vllm==0.3.2)
Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.7 kB)
Collecting triton>=2.1.0 (from vllm==0.3.2)
Downloading triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.3 kB)
Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch==2.1.2->vllm==0.3.2)
Downloading nvidia_nvjitlink_cu12-12.5.40-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting annotated-types>=0.4.0 (from pydantic>=2.0->vllm==0.3.2)
Downloading annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.18.4 (from pydantic>=2.0->vllm==0.3.2)
Downloading pydantic_core-2.18.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.5 kB)
Collecting click>=7.0 (from ray>=2.9->vllm==0.3.2)
Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting jsonschema (from ray>=2.9->vllm==0.3.2)
Downloading jsonschema-4.22.0-py3-none-any.whl.metadata (8.2 kB)
Collecting msgpack<2.0.0,>=1.0.0 (from ray>=2.9->vllm==0.3.2)
Downloading msgpack-1.0.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.1 kB)
Collecting packaging (from ray>=2.9->vllm==0.3.2)
Downloading packaging-24.0-py3-none-any.whl.metadata (3.2 kB)
Collecting protobuf!=3.19.5,>=3.15.3 (from ray>=2.9->vllm==0.3.2)
Downloading protobuf-5.27.1-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Collecting pyyaml (from ray>=2.9->vllm==0.3.2)
Downloading PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting aiosignal (from ray>=2.9->vllm==0.3.2)
Downloading aiosignal-1.3.1-py3-none-any.whl.metadata (4.0 kB)
Collecting frozenlist (from ray>=2.9->vllm==0.3.2)
Downloading frozenlist-1.4.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting requests (from ray>=2.9->vllm==0.3.2)
Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting huggingface-hub<1.0,>=0.23.0 (from transformers>=4.38.0->vllm==0.3.2)
Downloading huggingface_hub-0.23.3-py3-none-any.whl.metadata (12 kB)
Collecting regex!=2019.12.17 (from transformers>=4.38.0->vllm==0.3.2)
Downloading regex-2024.5.15-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.9/40.9 kB 2.7 MB/s eta 0:00:00
Collecting tokenizers<0.20,>=0.19 (from transformers>=4.38.0->vllm==0.3.2)
Downloading tokenizers-0.19.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting safetensors>=0.4.1 (from transformers>=4.38.0->vllm==0.3.2)
Downloading safetensors-0.4.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting tqdm>=4.27 (from transformers>=4.38.0->vllm==0.3.2)
Downloading tqdm-4.66.4-py3-none-any.whl.metadata (57 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.6/57.6 kB 5.6 MB/s eta 0:00:00
Collecting orjson (from aioprometheus[starlette]->vllm==0.3.2)
Downloading orjson-3.10.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (49 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.7/49.7 kB 4.7 MB/s eta 0:00:00
Collecting quantile-python>=1.1 (from aioprometheus[starlette]->vllm==0.3.2)
Downloading quantile-python-1.1.tar.gz (2.9 kB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting starlette>=0.14.2 (from aioprometheus[starlette]->vllm==0.3.2)
Downloading starlette-0.37.2-py3-none-any.whl.metadata (5.9 kB)
Collecting fastapi-cli>=0.0.2 (from fastapi->vllm==0.3.2)
Downloading fastapi_cli-0.0.4-py3-none-any.whl.metadata (7.0 kB)
Collecting httpx>=0.23.0 (from fastapi->vllm==0.3.2)
Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting python-multipart>=0.0.7 (from fastapi->vllm==0.3.2)
Downloading python_multipart-0.0.9-py3-none-any.whl.metadata (2.5 kB)
Collecting ujson!=4.0.2,!=4.1.0,!=4.2.0,!=4.3.0,!=5.0.0,!=5.1.0,>=4.0.1 (from fastapi->vllm==0.3.2)
Downloading ujson-5.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.3 kB)
Collecting email_validator>=2.0.0 (from fastapi->vllm==0.3.2)
Downloading email_validator-2.1.1-py3-none-any.whl.metadata (26 kB)
Collecting h11>=0.8 (from uvicorn[standard]->vllm==0.3.2)
Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Collecting httptools>=0.5.0 (from uvicorn[standard]->vllm==0.3.2)
Downloading httptools-0.6.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting python-dotenv>=0.13 (from uvicorn[standard]->vllm==0.3.2)
Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting uvloop!=0.15.0,!=0.15.1,>=0.14.0 (from uvicorn[standard]->vllm==0.3.2)
Downloading uvloop-0.19.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Collecting watchfiles>=0.13 (from uvicorn[standard]->vllm==0.3.2)
Downloading watchfiles-0.22.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Collecting websockets>=10.4 (from uvicorn[standard]->vllm==0.3.2)
Downloading websockets-12.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting dnspython>=2.0.0 (from email_validator>=2.0.0->fastapi->vllm==0.3.2)
Downloading dnspython-2.6.1-py3-none-any.whl.metadata (5.8 kB)
Collecting idna>=2.0.0 (from email_validator>=2.0.0->fastapi->vllm==0.3.2)
Downloading idna-3.7-py3-none-any.whl.metadata (9.9 kB)
Collecting typer>=0.12.3 (from fastapi-cli>=0.0.2->fastapi->vllm==0.3.2)
Downloading typer-0.12.3-py3-none-any.whl.metadata (15 kB)
Collecting anyio (from httpx>=0.23.0->fastapi->vllm==0.3.2)
Downloading anyio-4.4.0-py3-none-any.whl.metadata (4.6 kB)
Collecting certifi (from httpx>=0.23.0->fastapi->vllm==0.3.2)
Downloading certifi-2024.6.2-py3-none-any.whl.metadata (2.2 kB)
Collecting httpcore==1.* (from httpx>=0.23.0->fastapi->vllm==0.3.2)
Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting sniffio (from httpx>=0.23.0->fastapi->vllm==0.3.2)
Downloading sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting MarkupSafe>=2.0 (from jinja2->torch==2.1.2->vllm==0.3.2)
Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting attrs>=22.2.0 (from jsonschema->ray>=2.9->vllm==0.3.2)
Downloading attrs-23.2.0-py3-none-any.whl.metadata (9.5 kB)
Collecting jsonschema-specifications>=2023.03.6 (from jsonschema->ray>=2.9->vllm==0.3.2)
Downloading jsonschema_specifications-2023.12.1-py3-none-any.whl.metadata (3.0 kB)
Collecting referencing>=0.28.4 (from jsonschema->ray>=2.9->vllm==0.3.2)
Downloading referencing-0.35.1-py3-none-any.whl.metadata (2.8 kB)
Collecting rpds-py>=0.7.1 (from jsonschema->ray>=2.9->vllm==0.3.2)
Downloading rpds_py-0.18.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Collecting charset-normalizer<4,>=2 (from requests->ray>=2.9->vllm==0.3.2)
Downloading charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (33 kB)
Collecting urllib3<3,>=1.21.1 (from requests->ray>=2.9->vllm==0.3.2)
Downloading urllib3-2.2.1-py3-none-any.whl.metadata (6.4 kB)
Collecting mpmath<1.4.0,>=1.1.0 (from sympy->torch==2.1.2->vllm==0.3.2)
Downloading mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Collecting exceptiongroup>=1.0.2 (from anyio->httpx>=0.23.0->fastapi->vllm==0.3.2)
Downloading exceptiongroup-1.2.1-py3-none-any.whl.metadata (6.6 kB)
Collecting shellingham>=1.3.0 (from typer>=0.12.3->fastapi-cli>=0.0.2->fastapi->vllm==0.3.2)
Downloading shellingham-1.5.4-py2.py3-none-any.whl.metadata (3.5 kB)
Collecting rich>=10.11.0 (from typer>=0.12.3->fastapi-cli>=0.0.2->fastapi->vllm==0.3.2)
Downloading rich-13.7.1-py3-none-any.whl.metadata (18 kB)
Collecting markdown-it-py>=2.2.0 (from rich>=10.11.0->typer>=0.12.3->fastapi-cli>=0.0.2->fastapi->vllm==0.3.2)
Downloading markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting pygments<3.0.0,>=2.13.0 (from rich>=10.11.0->typer>=0.12.3->fastapi-cli>=0.0.2->fastapi->vllm==0.3.2)
Downloading pygments-2.18.0-py3-none-any.whl.metadata (2.5 kB)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich>=10.11.0->typer>=0.12.3->fastapi-cli>=0.0.2->fastapi->vllm==0.3.2)
Downloading mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
Downloading vllm-0.3.2-cp310-cp310-manylinux1_x86_64.whl (41.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.4/41.4 MB 16.2 MB/s eta 0:00:00
Downloading cupy_cuda12x-12.1.0-cp310-cp310-manylinux2014_x86_64.whl (83.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 83.0/83.0 MB 8.2 MB/s eta 0:00:00
Downloading pynvml-11.5.0-py3-none-any.whl (53 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.1/53.1 kB 3.7 MB/s eta 0:00:00
Downloading torch-2.1.2-cp310-cp310-manylinux1_x86_64.whl (670.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 670.2/670.2 MB 1.4 MB/s eta 0:00:00
Downloading triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.2/89.2 MB 7.9 MB/s eta 0:00:00
Downloading xformers-0.0.23.post1-cp310-cp310-manylinux2014_x86_64.whl (213.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 213.0/213.0 MB 4.1 MB/s eta 0:00:00
Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 2.3 MB/s eta 0:00:00
Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 10.0 MB/s eta 0:00:00
Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 4.9 MB/s eta 0:00:00
Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 60.0 MB/s eta 0:00:00
Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 943.0 kB/s eta 0:00:00
Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 1.1 MB/s eta 0:00:00
Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 2.4 MB/s eta 0:00:00
Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 4.9 MB/s eta 0:00:00
Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 1.1 MB/s eta 0:00:00
Downloading nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl (209.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.8/209.8 MB 4.2 MB/s eta 0:00:00
Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 8.1 MB/s eta 0:00:00
Downloading numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 39.6 MB/s eta 0:00:00
Downloading pydantic-2.7.3-py3-none-any.whl (409 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 409.6/409.6 kB 36.3 MB/s eta 0:00:00
Downloading pydantic_core-2.18.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 69.9 MB/s eta 0:00:00
Downloading ray-2.24.0-cp310-cp310-manylinux2014_x86_64.whl (65.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.9/65.9 MB 10.0 MB/s eta 0:00:00
Downloading transformers-4.41.2-py3-none-any.whl (9.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.1/9.1 MB 109.7 MB/s eta 0:00:00
Downloading fastapi-0.111.0-py3-none-any.whl (91 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 92.0/92.0 kB 9.8 MB/s eta 0:00:00
Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.2/307.2 kB 34.0 MB/s eta 0:00:00
Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 288.2/288.2 kB 26.0 MB/s eta 0:00:00
Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 58.0 MB/s eta 0:00:00
Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB)
Downloading click-8.1.7-py3-none-any.whl (97 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.9/97.9 kB 10.3 MB/s eta 0:00:00
Downloading email_validator-2.1.1-py3-none-any.whl (30 kB)
Downloading fastapi_cli-0.0.4-py3-none-any.whl (9.5 kB)
Downloading fastrlock-0.8.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (51 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51.3/51.3 kB 4.4 MB/s eta 0:00:00
Downloading h11-0.14.0-py3-none-any.whl (58 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 5.9 MB/s eta 0:00:00
Downloading httptools-0.6.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (341 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 341.4/341.4 kB 33.6 MB/s eta 0:00:00
Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.6/75.6 kB 8.2 MB/s eta 0:00:00
Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.9/77.9 kB 9.0 MB/s eta 0:00:00
Downloading huggingface_hub-0.23.3-py3-none-any.whl (401 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 401.7/401.7 kB 37.1 MB/s eta 0:00:00
Downloading fsspec-2024.6.0-py3-none-any.whl (176 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.9/176.9 kB 19.4 MB/s eta 0:00:00
Downloading jinja2-3.1.4-py3-none-any.whl (133 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 kB 14.1 MB/s eta 0:00:00
Downloading msgpack-1.0.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (385 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 385.1/385.1 kB 34.0 MB/s eta 0:00:00
Downloading orjson-3.10.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 142.5/142.5 kB 14.9 MB/s eta 0:00:00
Downloading packaging-24.0-py3-none-any.whl (53 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.5/53.5 kB 5.3 MB/s eta 0:00:00
Downloading protobuf-5.27.1-cp38-abi3-manylinux2014_x86_64.whl (309 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 309.2/309.2 kB 32.3 MB/s eta 0:00:00
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Downloading python_multipart-0.0.9-py3-none-any.whl (22 kB)
Downloading PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (705 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 705.5/705.5 kB 25.2 MB/s eta 0:00:00
Downloading regex-2024.5.15-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (775 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 775.1/775.1 kB 58.5 MB/s eta 0:00:00
Downloading safetensors-0.4.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 85.8 MB/s eta 0:00:00
Downloading starlette-0.37.2-py3-none-any.whl (71 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.9/71.9 kB 7.5 MB/s eta 0:00:00
Downloading tokenizers-0.19.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 120.6 MB/s eta 0:00:00
Downloading tqdm-4.66.4-py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.3/78.3 kB 7.3 MB/s eta 0:00:00
Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Downloading ujson-5.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (53 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.6/53.6 kB 5.6 MB/s eta 0:00:00
Downloading uvicorn-0.30.1-py3-none-any.whl (62 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.4/62.4 kB 6.7 MB/s eta 0:00:00
Downloading uvloop-0.19.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 117.1 MB/s eta 0:00:00
Downloading watchfiles-0.22.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 87.2 MB/s eta 0:00:00
Downloading websockets-12.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (130 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 130.2/130.2 kB 12.7 MB/s eta 0:00:00
Downloading aioprometheus-23.12.0-py3-none-any.whl (31 kB)
Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Downloading frozenlist-1.4.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (239 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 239.5/239.5 kB 26.1 MB/s eta 0:00:00
Downloading filelock-3.14.0-py3-none-any.whl (12 kB)
Downloading jsonschema-4.22.0-py3-none-any.whl (88 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.3/88.3 kB 9.6 MB/s eta 0:00:00
Downloading networkx-3.3-py3-none-any.whl (1.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 66.2 MB/s eta 0:00:00
Downloading requests-2.32.3-py3-none-any.whl (64 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 7.1 MB/s eta 0:00:00
Downloading sympy-1.12.1-py3-none-any.whl (5.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 108.7 MB/s eta 0:00:00
Downloading anyio-4.4.0-py3-none-any.whl (86 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.8/86.8 kB 8.5 MB/s eta 0:00:00
Downloading attrs-23.2.0-py3-none-any.whl (60 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.8/60.8 kB 6.8 MB/s eta 0:00:00
Downloading certifi-2024.6.2-py3-none-any.whl (164 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 164.4/164.4 kB 17.5 MB/s eta 0:00:00
Downloading charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 142.1/142.1 kB 15.0 MB/s eta 0:00:00
Downloading dnspython-2.6.1-py3-none-any.whl (307 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.7/307.7 kB 30.3 MB/s eta 0:00:00
Downloading idna-3.7-py3-none-any.whl (66 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.8/66.8 kB 6.3 MB/s eta 0:00:00
Downloading jsonschema_specifications-2023.12.1-py3-none-any.whl (18 kB)
Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 49.7 MB/s eta 0:00:00
Downloading referencing-0.35.1-py3-none-any.whl (26 kB)
Downloading rpds_py-0.18.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 80.7 MB/s eta 0:00:00
Downloading sniffio-1.3.1-py3-none-any.whl (10 kB)
Downloading typer-0.12.3-py3-none-any.whl (47 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.2/47.2 kB 4.8 MB/s eta 0:00:00
Downloading urllib3-2.2.1-py3-none-any.whl (121 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.1/121.1 kB 12.6 MB/s eta 0:00:00
Downloading nvidia_nvjitlink_cu12-12.5.40-py3-none-manylinux2014_x86_64.whl (21.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.3/21.3 MB 29.4 MB/s eta 0:00:00
Downloading exceptiongroup-1.2.1-py3-none-any.whl (16 kB)
Downloading rich-13.7.1-py3-none-any.whl (240 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 240.7/240.7 kB 24.6 MB/s eta 0:00:00
Downloading shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB)
Downloading markdown_it_py-3.0.0-py3-none-any.whl (87 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87.5/87.5 kB 9.1 MB/s eta 0:00:00
Downloading pygments-2.18.0-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 83.0 MB/s eta 0:00:00
Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Building wheels for collected packages: quantile-python
Building wheel for quantile-python (setup.py): started
Building wheel for quantile-python (setup.py): finished with status 'done'
Created wheel for quantile-python: filename=quantile_python-1.1-py3-none-any.whl size=3443 sha256=8709fab3c63a2c2ac773179e20d77a3cf88eec45ddd05b8555fceb93a5c07052
Stored in directory: /home/azureuser/.cache/pip/wheels/6d/f4/0a/0e7d01548a005f9f3fa23101f071d248da052f2a9bf2fe11c6
Successfully built quantile-python
Installing collected packages: sentencepiece, quantile-python, ninja, mpmath, fastrlock, websockets, uvloop, urllib3, ujson, typing-extensions, tqdm, sympy, sniffio, shellingham, safetensors, rpds-py, regex, pyyaml, python-multipart, python-dotenv, pynvml, pygments, psutil, protobuf, packaging, orjson, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, msgpack, mdurl, MarkupSafe, idna, httptools, h11, fsspec, frozenlist, filelock, exceptiongroup, dnspython, click, charset-normalizer, certifi, attrs, annotated-types, uvicorn, triton, requests, referencing, pydantic-core, nvidia-cusparse-cu12, nvidia-cudnn-cu12, markdown-it-py, jinja2, httpcore, email_validator, cupy-cuda12x, anyio, aiosignal, aioprometheus, watchfiles, starlette, rich, pydantic, nvidia-cusolver-cu12, jsonschema-specifications, huggingface-hub, httpx, typer, torch, tokenizers, jsonschema, xformers, transformers, ray, fastapi-cli, fastapi, vllm
Successfully installed MarkupSafe-2.1.5 aioprometheus-23.12.0 aiosignal-1.3.1 annotated-types-0.7.0 anyio-4.4.0 attrs-23.2.0 certifi-2024.6.2 charset-normalizer-3.3.2 click-8.1.7 cupy-cuda12x-12.1.0 dnspython-2.6.1 email_validator-2.1.1 exceptiongroup-1.2.1 fastapi-0.111.0 fastapi-cli-0.0.4 fastrlock-0.8.2 filelock-3.14.0 frozenlist-1.4.1 fsspec-2024.6.0 h11-0.14.0 httpcore-1.0.5 httptools-0.6.1 httpx-0.27.0 huggingface-hub-0.23.3 idna-3.7 jinja2-3.1.4 jsonschema-4.22.0 jsonschema-specifications-2023.12.1 markdown-it-py-3.0.0 mdurl-0.1.2 mpmath-1.3.0 msgpack-1.0.8 networkx-3.3 ninja-1.11.1.1 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.5.40 nvidia-nvtx-cu12-12.1.105 orjson-3.10.3 packaging-24.0 protobuf-5.27.1 psutil-5.9.8 pydantic-2.7.3 pydantic-core-2.18.4 pygments-2.18.0 pynvml-11.5.0 python-dotenv-1.0.1 python-multipart-0.0.9 pyyaml-6.0.1 quantile-python-1.1 ray-2.24.0 referencing-0.35.1 regex-2024.5.15 requests-2.32.3 rich-13.7.1 rpds-py-0.18.1 safetensors-0.4.3 sentencepiece-0.2.0 shellingham-1.5.4 sniffio-1.3.1 starlette-0.37.2 sympy-1.12.1 tokenizers-0.19.1 torch-2.1.2 tqdm-4.66.4 transformers-4.41.2 triton-2.1.0 typer-0.12.3 typing-extensions-4.12.2 ujson-5.10.0 urllib3-2.2.1 uvicorn-0.30.1 uvloop-0.19.0 vllm-0.3.2 watchfiles-0.22.0 websockets-12.0 xformers-0.0.23.post1
Collecting transformers==4.38.0
Downloading transformers-4.38.0-py3-none-any.whl.metadata (131 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 131.1/131.1 kB 86.5 kB/s eta 0:00:00
Requirement already satisfied: filelock in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (3.14.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.19.3 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (0.23.3)
Requirement already satisfied: numpy>=1.17 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (24.0)
Requirement already satisfied: pyyaml>=5.1 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (6.0.1)
Requirement already satisfied: regex!=2019.12.17 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (2024.5.15)
Requirement already satisfied: requests in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (2.32.3)
Collecting tokenizers<0.19,>=0.14 (from transformers==4.38.0)
Downloading tokenizers-0.15.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Requirement already satisfied: safetensors>=0.4.1 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (0.4.3)
Requirement already satisfied: tqdm>=4.27 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (4.66.4)
Requirement already satisfied: fsspec>=2023.5.0 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.19.3->transformers==4.38.0) (2024.6.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.19.3->transformers==4.38.0) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from requests->transformers==4.38.0) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from requests->transformers==4.38.0) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from requests->transformers==4.38.0) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from requests->transformers==4.38.0) (2024.6.2)
Downloading transformers-4.38.0-py3-none-any.whl (8.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.5/8.5 MB 71.0 MB/s eta 0:00:00
Downloading tokenizers-0.15.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 95.9 MB/s eta 0:00:00
Installing collected packages: tokenizers, transformers
Attempting uninstall: tokenizers
Found existing installation: tokenizers 0.19.1
Uninstalling tokenizers-0.19.1:
Successfully uninstalled tokenizers-0.19.1
Attempting uninstall: transformers
Found existing installation: transformers 4.41.2
Uninstalling transformers-4.41.2:
Successfully uninstalled transformers-4.41.2
Successfully installed tokenizers-0.15.2 transformers-4.38.0
[skypilot.yaml] Done installing packages.
I 06-07 17:10:28 cloud_vm_ray_backend.py:3228] Setup completed.
I 06-07 17:10:28 cloud_vm_ray_backend.py:3414] Multiple resources are specified for the task, using: Azure({'A10': 1}, disk_tier=best, ports=['8000'])
I 06-07 17:10:31 cloud_vm_ray_backend.py:3315] Job submitted with Job ID: 1
I 06-08 00:10:31 log_lib.py:412] Start streaming logs for job 1.
INFO: Tip: use Ctrl-C to exit log streaming (task will not be killed).
INFO: Waiting for task resources on 1 node. This will block if the cluster is full.
INFO: All task resources reserved.
INFO: Reserved IPs: ['<redacted>']
(task, pid=11482) [skypilot.yaml] Listing available conda environments:
(task, pid=11482) # conda environments:
(task, pid=11482) #
(task, pid=11482) base * /home/azureuser/miniconda3
(task, pid=11482) qwen /home/azureuser/miniconda3/envs/qwen
(task, pid=11482)
(task, pid=11482) [skypilot.yaml] Activating conda environment 'qwen'
(task, pid=11482) [skypilot.yaml] Listing available conda environments:
(task, pid=11482) # conda environments:
(task, pid=11482) #
(task, pid=11482) base /home/azureuser/miniconda3
(task, pid=11482) qwen * /home/azureuser/miniconda3/envs/qwen
(task, pid=11482)
(task, pid=11482) [skypilot.yaml] Listing installed packages:
(task, pid=11482) Package Version
(task, pid=11482) ------------------------- ------------
(task, pid=11482) aioprometheus 23.12.0
(task, pid=11482) aiosignal 1.3.1
(task, pid=11482) annotated-types 0.7.0
(task, pid=11482) anyio 4.4.0
(task, pid=11482) attrs 23.2.0
(task, pid=11482) certifi 2024.6.2
(task, pid=11482) charset-normalizer 3.3.2
(task, pid=11482) click 8.1.7
(task, pid=11482) cupy-cuda12x 12.1.0
(task, pid=11482) dnspython 2.6.1
(task, pid=11482) email_validator 2.1.1
(task, pid=11482) exceptiongroup 1.2.1
(task, pid=11482) fastapi 0.111.0
(task, pid=11482) fastapi-cli 0.0.4
(task, pid=11482) fastrlock 0.8.2
(task, pid=11482) filelock 3.14.0
(task, pid=11482) frozenlist 1.4.1
(task, pid=11482) fsspec 2024.6.0
(task, pid=11482) h11 0.14.0
(task, pid=11482) httpcore 1.0.5
(task, pid=11482) httptools 0.6.1
(task, pid=11482) httpx 0.27.0
(task, pid=11482) huggingface-hub 0.23.3
(task, pid=11482) idna 3.7
(task, pid=11482) Jinja2 3.1.4
(task, pid=11482) jsonschema 4.22.0
(task, pid=11482) jsonschema-specifications 2023.12.1
(task, pid=11482) markdown-it-py 3.0.0
(task, pid=11482) MarkupSafe 2.1.5
(task, pid=11482) mdurl 0.1.2
(task, pid=11482) mpmath 1.3.0
(task, pid=11482) msgpack 1.0.8
(task, pid=11482) networkx 3.3
(task, pid=11482) ninja 1.11.1.1
(task, pid=11482) numpy 1.26.4
(task, pid=11482) nvidia-cublas-cu12 12.1.3.1
(task, pid=11482) nvidia-cuda-cupti-cu12 12.1.105
(task, pid=11482) nvidia-cuda-nvrtc-cu12 12.1.105
(task, pid=11482) nvidia-cuda-runtime-cu12 12.1.105
(task, pid=11482) nvidia-cudnn-cu12 8.9.2.26
(task, pid=11482) nvidia-cufft-cu12 11.0.2.54
(task, pid=11482) nvidia-curand-cu12 10.3.2.106
(task, pid=11482) nvidia-cusolver-cu12 11.4.5.107
(task, pid=11482) nvidia-cusparse-cu12 12.1.0.106
(task, pid=11482) nvidia-nccl-cu12 2.18.1
(task, pid=11482) nvidia-nvjitlink-cu12 12.5.40
(task, pid=11482) nvidia-nvtx-cu12 12.1.105
(task, pid=11482) orjson 3.10.3
(task, pid=11482) packaging 24.0
(task, pid=11482) pip 24.0
(task, pid=11482) protobuf 5.27.1
(task, pid=11482) psutil 5.9.8
(task, pid=11482) pydantic 2.7.3
(task, pid=11482) pydantic_core 2.18.4
(task, pid=11482) Pygments 2.18.0
(task, pid=11482) pynvml 11.5.0
(task, pid=11482) python-dotenv 1.0.1
(task, pid=11482) python-multipart 0.0.9
(task, pid=11482) PyYAML 6.0.1
(task, pid=11482) quantile-python 1.1
(task, pid=11482) ray 2.24.0
(task, pid=11482) referencing 0.35.1
(task, pid=11482) regex 2024.5.15
(task, pid=11482) requests 2.32.3
(task, pid=11482) rich 13.7.1
(task, pid=11482) rpds-py 0.18.1
(task, pid=11482) safetensors 0.4.3
(task, pid=11482) sentencepiece 0.2.0
(task, pid=11482) setuptools 69.5.1
(task, pid=11482) shellingham 1.5.4
(task, pid=11482) sniffio 1.3.1
(task, pid=11482) starlette 0.37.2
(task, pid=11482) sympy 1.12.1
(task, pid=11482) tokenizers 0.15.2
(task, pid=11482) torch 2.1.2
(task, pid=11482) tqdm 4.66.4
(task, pid=11482) transformers 4.38.0
(task, pid=11482) triton 2.1.0
(task, pid=11482) typer 0.12.3
(task, pid=11482) typing_extensions 4.12.2
(task, pid=11482) ujson 5.10.0
(task, pid=11482) urllib3 2.2.1
(task, pid=11482) uvicorn 0.30.1
(task, pid=11482) uvloop 0.19.0
(task, pid=11482) vllm 0.3.2
(task, pid=11482) watchfiles 0.22.0
(task, pid=11482) websockets 12.0
(task, pid=11482) wheel 0.43.0
(task, pid=11482) xformers 0.0.23.post1
(task, pid=11482) [skypilot.yaml] Setting PATH to include /sbin
(task, pid=11482) [skypilot.yaml] Starting vllm OpenAI API server with the following configuration:
(task, pid=11482) [skypilot.yaml] - Host: 0.0.0.0
(task, pid=11482) [skypilot.yaml] - Model: Qwen/Qwen1.5-7B-Chat
(task, pid=11482) [skypilot.yaml] - Tensor Parallel Size: 1
(task, pid=11482) [skypilot.yaml] - Maximum Model Length: 1024
(task, pid=11482) INFO 06-08 00:10:36 api_server.py:229] args: Namespace(host='0.0.0.0', port=8000, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, root_path=None, middleware=[], model='Qwen/Qwen1.5-7B-Chat', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', max_model_len=1024, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, block_size=16, seed=0, swap_space=4, gpu_memory_utilization=0.9, max_num_batched_tokens=None, max_num_seqs=256, max_paddings=256, disable_log_stats=False, quantization=None, enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='cuda', engine_use_ray=False, disable_log_requests=False, max_log_len=None)
(task, pid=11482) INFO 06-08 00:10:36 llm_engine.py:79] Initializing an LLM engine with config: model='Qwen/Qwen1.5-7B-Chat', tokenizer='Qwen/Qwen1.5-7B-Chat', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
(task, pid=11482) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
(task, pid=11482) Traceback (most recent call last):
(task, pid=11482) File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/runpy.py", line 196, in _run_module_as_main
(task, pid=11482) return _run_code(code, main_globals, None,
(task, pid=11482) File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/runpy.py", line 86, in _run_code
(task, pid=11482) exec(code, run_globals)
(task, pid=11482) File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 237, in <module>
(task, pid=11482) engine = AsyncLLMEngine.from_engine_args(engine_args)
(task, pid=11482) File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 625, in from_engine_args
(task, pid=11482) engine = cls(parallel_config.worker_use_ray,
(task, pid=11482) File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 321, in __init__
(task, pid=11482) self.engine = self._init_engine(*args, **kwargs)
(task, pid=11482) File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 366, in _init_engine
(task, pid=11482) return engine_class(*args, **kwargs)
(task, pid=11482) File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 120, in __init__
(task, pid=11482) self._init_workers()
(task, pid=11482) File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 163, in _init_workers
(task, pid=11482) self._run_workers("init_model")
(task, pid=11482) File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1014, in _run_workers
(task, pid=11482) driver_worker_output = getattr(self.driver_worker,
(task, pid=11482) File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in init_model
(task, pid=11482) torch.cuda.set_device(self.device)
(task, pid=11482) File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/torch/cuda/__init__.py", line 404, in set_device
(task, pid=11482) torch._C._cuda_setDevice(device)
(task, pid=11482) File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
(task, pid=11482) torch._C._cuda_init()
(task, pid=11482) RuntimeError: No CUDA GPUs are available
INFO: Job finished (status: SUCCEEDED).
I 06-07 17:10:43 cloud_vm_ray_backend.py:3350] Job ID: 1
I 06-07 17:10:43 cloud_vm_ray_backend.py:3350] To cancel the job: sky cancel qwen 1
I 06-07 17:10:43 cloud_vm_ray_backend.py:3350] To stream job logs: sky logs qwen 1
I 06-07 17:10:43 cloud_vm_ray_backend.py:3350] To view the job queue: sky queue qwen
I 06-07 17:10:43 cloud_vm_ray_backend.py:3446]
I 06-07 17:10:43 cloud_vm_ray_backend.py:3446] Cluster name: qwen
I 06-07 17:10:43 cloud_vm_ray_backend.py:3446] To log into the head VM: ssh qwen
I 06-07 17:10:43 cloud_vm_ray_backend.py:3446] To submit a job: sky exec qwen yaml_file
I 06-07 17:10:43 cloud_vm_ray_backend.py:3446] To stop the cluster: sky stop qwen
I 06-07 17:10:43 cloud_vm_ray_backend.py:3446] To teardown the cluster: sky down qwen
Clusters
NAME LAUNCHED RESOURCES STATUS AUTOSTOP COMMAND
qwen 1 hr ago 1x Azure(Standard_NV6ads_A10_v5, {'A10': 1}, disk_tier=best, ports=['8... UP - sky launch -c qwen x.yaml...
Hmm, good catch! Does this problem also happen for other GPUs types like A100, or is it an issue with A10 only?
I tested on Standard_NC24ads_A100_v4
and Standard_NV6ads_A10_v5
but it happens on A10 only
Bug
ubuntu-hpc
22.04 image for a gen 2 instance.To Reproduce
sky launch -c qwen skypilot.yaml --cloud azure --region westus3
ssh qwen && nvidia-smi
skypilot.yaml (modifed from qwen-7b.yaml, with extra logging statements and to use A10 only)
Version & Commit info:
sky -v
:skypilot, version 1.0.0.dev20240607
sky -c
:skypilot, commit 26d902d7e47900bb6b6c897f6fda79047b35df35