skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.78k stars 509 forks source link

No CUDA drivers in Azure A10 #3651

Closed WesleyYue closed 4 months ago

WesleyYue commented 5 months ago

Bug

Screenshot 2024-06-07 at 5 03 23 PM Screenshot 2024-06-07 at 5 03 18 PM

To Reproduce

  1. Run sky launch -c qwen skypilot.yaml --cloud azure --region westus3
  2. Observe that the launch fails and errors related to no CUDA drivers found
  3. Confirm that CUDA drivers indeed does not exist by ssh qwen && nvidia-smi

skypilot.yaml (modifed from qwen-7b.yaml, with extra logging statements and to use A10 only)

envs:
  MODEL_NAME: Qwen/Qwen1.5-7B-Chat

service:
  # Specifying the path to the endpoint to check the readiness of the replicas.
  readiness_probe:
    path: /v1/chat/completions
    post_data:
      model: $MODEL_NAME
      messages:
        - role: user
          content: Hello! What is your name?
      max_tokens: 1
    initial_delay_seconds: 1200
  # How many replicas to manage.
  replicas: 1

resources:
  # accelerators: { L4, A10g, A10, L40, A40, A100:1, A100-80GB:1 }
  accelerators: { A10 }
  disk_tier: best
  ports: 8000

setup: |
  echo "[skypilot.yaml] Activating conda environment 'qwen'"
  conda activate qwen
  if [ $? -ne 0 ]; then
    echo "[skypilot.yaml] Creating new conda environment 'qwen' with Python 3.10"
    conda create -n qwen python=3.10 -y
    conda activate qwen
  fi
  echo "[skypilot.yaml] Installing required packages..."
  pip install -U vllm==0.3.2
  pip install -U transformers==4.38.0
  echo "[skypilot.yaml] Done installing packages."

run: |
  echo "[skypilot.yaml] Listing available conda environments:"
  conda env list
  echo "[skypilot.yaml] Activating conda environment 'qwen'"
  conda activate qwen
  echo "[skypilot.yaml] Listing available conda environments:"
  conda env list
  echo "[skypilot.yaml] Listing installed packages:"
  pip list
  echo "[skypilot.yaml] Setting PATH to include /sbin"
  export PATH=$PATH:/sbin
  echo "[skypilot.yaml] Starting vllm OpenAI API server with the following configuration:"
  echo "[skypilot.yaml]   - Host: 0.0.0.0"
  echo "[skypilot.yaml]   - Model: $MODEL_NAME"
  echo "[skypilot.yaml]   - Tensor Parallel Size: $SKYPILOT_NUM_GPUS_PER_NODE"
  echo "[skypilot.yaml]   - Maximum Model Length: 1024"
  python -m vllm.entrypoints.openai.api_server \
    --host 0.0.0.0 \
    --model $MODEL_NAME \
    --tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \
    --max-model-len 1024 | tee ~/openai_api_server.log

Version & Commit info:

WesleyYue commented 5 months ago

Full logs:

Task from YAML spec: x.yaml
W 06-07 16:53:05 aws_catalog.py:173] Failed to fetch availability zone mapping. ImportError: Failed to import dependencies for AWS. Try pip install "skypilot[aws]"
W 06-07 16:53:06 aws_catalog.py:173] Failed to fetch availability zone mapping. ImportError: Failed to import dependencies for AWS. Try pip install "skypilot[aws]"
I 06-07 16:53:06 cli.py:1112] Service section will be ignored when using `sky launch`. 
I 06-07 16:53:06 cli.py:1112] To spin up a service, use SkyServe CLI: sky serve up
I 06-07 16:53:06 optimizer.py:1264] No resource satisfying Azure({'A100': 1}, disk_tier=best, ports=['8000'], region=westus3) on Azure.
I 06-07 16:53:06 optimizer.py:1268] Did you mean: ['A100-80GB:1', 'A100-80GB:2', 'A100-80GB:4']
I 06-07 16:53:06 optimizer.py:1264] No resource satisfying Azure({'L4': 1}, disk_tier=best, ports=['8000'], region=westus3) on Azure.
I 06-07 16:53:06 optimizer.py:1268] Did you mean: ['A100-80GB:1', 'A100-80GB:2', 'A100-80GB:4']
I 06-07 16:53:06 optimizer.py:1264] No resource satisfying Azure({'A10G': 1}, disk_tier=best, ports=['8000'], region=westus3) on Azure.
I 06-07 16:53:06 optimizer.py:1268] Did you mean: ['A100-80GB:1', 'A100-80GB:2', 'A100-80GB:4']
I 06-07 16:53:06 optimizer.py:1264] No resource satisfying Azure({'A40': 1}, disk_tier=best, ports=['8000'], region=westus3) on Azure.
I 06-07 16:53:06 optimizer.py:1268] Did you mean: ['A100-80GB:1', 'A100-80GB:2', 'A100-80GB:4']
I 06-07 16:53:06 optimizer.py:1264] No resource satisfying Azure({'L40': 1}, disk_tier=best, ports=['8000'], region=westus3) on Azure.
I 06-07 16:53:06 optimizer.py:1268] Did you mean: ['A100-80GB:1', 'A100-80GB:2', 'A100-80GB:4']
I 06-07 16:53:06 optimizer.py:695] == Optimizer ==
I 06-07 16:53:06 optimizer.py:706] Target: minimizing cost
I 06-07 16:53:06 optimizer.py:718] Estimated cost: $0.5 / hour
I 06-07 16:53:06 optimizer.py:718] 
I 06-07 16:53:06 optimizer.py:843] Considered resources (1 node):
I 06-07 16:53:06 optimizer.py:913] -------------------------------------------------------------------------------------------------------
I 06-07 16:53:06 optimizer.py:913]  CLOUD   INSTANCE                   vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE   COST ($)   CHOSEN   
I 06-07 16:53:06 optimizer.py:913] -------------------------------------------------------------------------------------------------------
I 06-07 16:53:06 optimizer.py:913]  Azure   Standard_NV6ads_A10_v5     6       55        A10:1          westus3       0.45          ✔     
I 06-07 16:53:06 optimizer.py:913]  Azure   Standard_NC24ads_A100_v4   24      220       A100-80GB:1    westus3       3.67                
I 06-07 16:53:06 optimizer.py:913] -------------------------------------------------------------------------------------------------------
I 06-07 16:53:06 optimizer.py:913] 
I 06-07 16:53:06 optimizer.py:931] Multiple Azure instances satisfy A10:1. The cheapest Azure(Standard_NV6ads_A10_v5, {'A10': 1}, disk_tier=best, ports=['8000']) is considered among:
I 06-07 16:53:06 optimizer.py:931] ['Standard_NV6ads_A10_v5', 'Standard_NV12ads_A10_v5', 'Standard_NV18ads_A10_v5', 'Standard_NV36ads_A10_v5', 'Standard_NV36adms_A10_v5'].
I 06-07 16:53:06 optimizer.py:931] 
I 06-07 16:53:06 optimizer.py:937] To list more details, run 'sky show-gpus A10'.
Launching a new cluster 'qwen'. Proceed? [Y/n]: y
I 06-07 16:53:10 cloud_vm_ray_backend.py:4397] Creating a new cluster: 'qwen' [1x Azure(Standard_NV6ads_A10_v5, {'A10': 1}, disk_tier=best, ports=['8000'])].
I 06-07 16:53:10 cloud_vm_ray_backend.py:4397] Tip: to reuse an existing cluster, specify --cluster (-c). Run `sky status` to see existing clusters.
I 06-07 16:53:13 cloud_vm_ray_backend.py:1385] To view detailed progress: tail -n100 -f /Users/wesley/sky_logs/sky-2024-06-07-16-53-06-502155/provision.log
I 06-07 16:53:13 cloud_vm_ray_backend.py:1779] Launching on Azure westus3
I 06-07 16:57:13 log_utils.py:45] Head node is up.
I 06-07 17:06:59 cloud_vm_ray_backend.py:1627] Successfully provisioned or found existing VM.
I 06-07 17:07:03 cloud_vm_ray_backend.py:3215] Running setup on 1 node.
[skypilot.yaml] Activating conda environment 'qwen'

EnvironmentNameNotFound: Could not find conda environment: qwen
You can list all discoverable environments with `conda info --envs`.

[skypilot.yaml] Creating new conda environment 'qwen' with Python 3.10
Channels:
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /home/azureuser/miniconda3/envs/qwen

  added / updated specs:
    - python=3.10

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    bzip2-1.0.8                |       h5eee18b_6         262 KB
    ca-certificates-2024.3.11  |       h06a4308_0         127 KB
    libffi-3.4.4               |       h6a678d5_1         141 KB
    openssl-3.0.13             |       h7f8727e_2         5.2 MB
    pip-24.0                   |  py310h06a4308_0         2.7 MB
    python-3.10.14             |       h955ad1f_1        26.8 MB
    setuptools-69.5.1          |  py310h06a4308_0        1012 KB
    sqlite-3.45.3              |       h5eee18b_0         1.2 MB
    tk-8.6.14                  |       h39e8969_0         3.4 MB
    tzdata-2024a               |       h04d1e81_0         116 KB
    wheel-0.43.0               |  py310h06a4308_0         110 KB
    xz-5.4.6                   |       h5eee18b_1         643 KB
    zlib-1.2.13                |       h5eee18b_1         111 KB
    ------------------------------------------------------------
                                           Total:        41.8 MB

The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main 
  _openmp_mutex      pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu 
  bzip2              pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6 
  ca-certificates    pkgs/main/linux-64::ca-certificates-2024.3.11-h06a4308_0 
  ld_impl_linux-64   pkgs/main/linux-64::ld_impl_linux-64-2.38-h1181459_1 
  libffi             pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1 
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 
  libgomp            pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 
  libuuid            pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0 
  ncurses            pkgs/main/linux-64::ncurses-6.4-h6a678d5_0 
  openssl            pkgs/main/linux-64::openssl-3.0.13-h7f8727e_2 
  pip                pkgs/main/linux-64::pip-24.0-py310h06a4308_0 
  python             pkgs/main/linux-64::python-3.10.14-h955ad1f_1 
  readline           pkgs/main/linux-64::readline-8.2-h5eee18b_0 
  setuptools         pkgs/main/linux-64::setuptools-69.5.1-py310h06a4308_0 
  sqlite             pkgs/main/linux-64::sqlite-3.45.3-h5eee18b_0 
  tk                 pkgs/main/linux-64::tk-8.6.14-h39e8969_0 
  tzdata             pkgs/main/noarch::tzdata-2024a-h04d1e81_0 
  wheel              pkgs/main/linux-64::wheel-0.43.0-py310h06a4308_0 
  xz                 pkgs/main/linux-64::xz-5.4.6-h5eee18b_1 
  zlib               pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1 

Downloading and Extracting Packages: ...working... done
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
#
# To activate this environment, use
#
#     $ conda activate qwen
#
# To deactivate an active environment, use
#
#     $ conda deactivate

[skypilot.yaml] Installing required packages...
Collecting vllm==0.3.2
  Downloading vllm-0.3.2-cp310-cp310-manylinux1_x86_64.whl.metadata (7.5 kB)
Collecting ninja (from vllm==0.3.2)
  Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
Collecting psutil (from vllm==0.3.2)
  Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (21 kB)
Collecting ray>=2.9 (from vllm==0.3.2)
  Downloading ray-2.24.0-cp310-cp310-manylinux2014_x86_64.whl.metadata (13 kB)
Collecting sentencepiece (from vllm==0.3.2)
  Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting numpy (from vllm==0.3.2)
  Downloading numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.0/61.0 kB 3.3 MB/s eta 0:00:00
Collecting torch==2.1.2 (from vllm==0.3.2)
  Downloading torch-2.1.2-cp310-cp310-manylinux1_x86_64.whl.metadata (25 kB)
Collecting transformers>=4.38.0 (from vllm==0.3.2)
  Downloading transformers-4.41.2-py3-none-any.whl.metadata (43 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.8/43.8 kB 3.8 MB/s eta 0:00:00
Collecting xformers==0.0.23.post1 (from vllm==0.3.2)
  Downloading xformers-0.0.23.post1-cp310-cp310-manylinux2014_x86_64.whl.metadata (1.0 kB)
Collecting fastapi (from vllm==0.3.2)
  Downloading fastapi-0.111.0-py3-none-any.whl.metadata (25 kB)
Collecting uvicorn[standard] (from vllm==0.3.2)
  Downloading uvicorn-0.30.1-py3-none-any.whl.metadata (6.3 kB)
Collecting pydantic>=2.0 (from vllm==0.3.2)
  Downloading pydantic-2.7.3-py3-none-any.whl.metadata (108 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.0/109.0 kB 5.9 MB/s eta 0:00:00
Collecting aioprometheus[starlette] (from vllm==0.3.2)
  Downloading aioprometheus-23.12.0-py3-none-any.whl.metadata (9.8 kB)
Collecting pynvml==11.5.0 (from vllm==0.3.2)
  Downloading pynvml-11.5.0-py3-none-any.whl.metadata (7.8 kB)
Collecting triton>=2.1.0 (from vllm==0.3.2)
  Downloading triton-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Collecting cupy-cuda12x==12.1.0 (from vllm==0.3.2)
  Downloading cupy_cuda12x-12.1.0-cp310-cp310-manylinux2014_x86_64.whl.metadata (2.6 kB)
Collecting fastrlock>=0.5 (from cupy-cuda12x==12.1.0->vllm==0.3.2)
  Downloading fastrlock-0.8.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.metadata (9.3 kB)
Collecting filelock (from torch==2.1.2->vllm==0.3.2)
  Downloading filelock-3.14.0-py3-none-any.whl.metadata (2.8 kB)
Collecting typing-extensions (from torch==2.1.2->vllm==0.3.2)
  Downloading typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting sympy (from torch==2.1.2->vllm==0.3.2)
  Downloading sympy-1.12.1-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch==2.1.2->vllm==0.3.2)
  Downloading networkx-3.3-py3-none-any.whl.metadata (5.1 kB)
Collecting jinja2 (from torch==2.1.2->vllm==0.3.2)
  Downloading jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting fsspec (from torch==2.1.2->vllm==0.3.2)
  Downloading fsspec-2024.6.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.1.2->vllm==0.3.2)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.1.2->vllm==0.3.2)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.1.2->vllm==0.3.2)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.1.2->vllm==0.3.2)
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.1.2->vllm==0.3.2)
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.1.2->vllm==0.3.2)
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.2.106 (from torch==2.1.2->vllm==0.3.2)
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch==2.1.2->vllm==0.3.2)
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch==2.1.2->vllm==0.3.2)
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-nccl-cu12==2.18.1 (from torch==2.1.2->vllm==0.3.2)
  Downloading nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-nvtx-cu12==12.1.105 (from torch==2.1.2->vllm==0.3.2)
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.7 kB)
Collecting triton>=2.1.0 (from vllm==0.3.2)
  Downloading triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.3 kB)
Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch==2.1.2->vllm==0.3.2)
  Downloading nvidia_nvjitlink_cu12-12.5.40-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting annotated-types>=0.4.0 (from pydantic>=2.0->vllm==0.3.2)
  Downloading annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.18.4 (from pydantic>=2.0->vllm==0.3.2)
  Downloading pydantic_core-2.18.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.5 kB)
Collecting click>=7.0 (from ray>=2.9->vllm==0.3.2)
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting jsonschema (from ray>=2.9->vllm==0.3.2)
  Downloading jsonschema-4.22.0-py3-none-any.whl.metadata (8.2 kB)
Collecting msgpack<2.0.0,>=1.0.0 (from ray>=2.9->vllm==0.3.2)
  Downloading msgpack-1.0.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.1 kB)
Collecting packaging (from ray>=2.9->vllm==0.3.2)
  Downloading packaging-24.0-py3-none-any.whl.metadata (3.2 kB)
Collecting protobuf!=3.19.5,>=3.15.3 (from ray>=2.9->vllm==0.3.2)
  Downloading protobuf-5.27.1-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Collecting pyyaml (from ray>=2.9->vllm==0.3.2)
  Downloading PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting aiosignal (from ray>=2.9->vllm==0.3.2)
  Downloading aiosignal-1.3.1-py3-none-any.whl.metadata (4.0 kB)
Collecting frozenlist (from ray>=2.9->vllm==0.3.2)
  Downloading frozenlist-1.4.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting requests (from ray>=2.9->vllm==0.3.2)
  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting huggingface-hub<1.0,>=0.23.0 (from transformers>=4.38.0->vllm==0.3.2)
  Downloading huggingface_hub-0.23.3-py3-none-any.whl.metadata (12 kB)
Collecting regex!=2019.12.17 (from transformers>=4.38.0->vllm==0.3.2)
  Downloading regex-2024.5.15-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.9/40.9 kB 2.7 MB/s eta 0:00:00
Collecting tokenizers<0.20,>=0.19 (from transformers>=4.38.0->vllm==0.3.2)
  Downloading tokenizers-0.19.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting safetensors>=0.4.1 (from transformers>=4.38.0->vllm==0.3.2)
  Downloading safetensors-0.4.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting tqdm>=4.27 (from transformers>=4.38.0->vllm==0.3.2)
  Downloading tqdm-4.66.4-py3-none-any.whl.metadata (57 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.6/57.6 kB 5.6 MB/s eta 0:00:00
Collecting orjson (from aioprometheus[starlette]->vllm==0.3.2)
  Downloading orjson-3.10.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (49 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.7/49.7 kB 4.7 MB/s eta 0:00:00
Collecting quantile-python>=1.1 (from aioprometheus[starlette]->vllm==0.3.2)
  Downloading quantile-python-1.1.tar.gz (2.9 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting starlette>=0.14.2 (from aioprometheus[starlette]->vllm==0.3.2)
  Downloading starlette-0.37.2-py3-none-any.whl.metadata (5.9 kB)
Collecting fastapi-cli>=0.0.2 (from fastapi->vllm==0.3.2)
  Downloading fastapi_cli-0.0.4-py3-none-any.whl.metadata (7.0 kB)
Collecting httpx>=0.23.0 (from fastapi->vllm==0.3.2)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting python-multipart>=0.0.7 (from fastapi->vllm==0.3.2)
  Downloading python_multipart-0.0.9-py3-none-any.whl.metadata (2.5 kB)
Collecting ujson!=4.0.2,!=4.1.0,!=4.2.0,!=4.3.0,!=5.0.0,!=5.1.0,>=4.0.1 (from fastapi->vllm==0.3.2)
  Downloading ujson-5.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.3 kB)
Collecting email_validator>=2.0.0 (from fastapi->vllm==0.3.2)
  Downloading email_validator-2.1.1-py3-none-any.whl.metadata (26 kB)
Collecting h11>=0.8 (from uvicorn[standard]->vllm==0.3.2)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Collecting httptools>=0.5.0 (from uvicorn[standard]->vllm==0.3.2)
  Downloading httptools-0.6.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting python-dotenv>=0.13 (from uvicorn[standard]->vllm==0.3.2)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting uvloop!=0.15.0,!=0.15.1,>=0.14.0 (from uvicorn[standard]->vllm==0.3.2)
  Downloading uvloop-0.19.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Collecting watchfiles>=0.13 (from uvicorn[standard]->vllm==0.3.2)
  Downloading watchfiles-0.22.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Collecting websockets>=10.4 (from uvicorn[standard]->vllm==0.3.2)
  Downloading websockets-12.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting dnspython>=2.0.0 (from email_validator>=2.0.0->fastapi->vllm==0.3.2)
  Downloading dnspython-2.6.1-py3-none-any.whl.metadata (5.8 kB)
Collecting idna>=2.0.0 (from email_validator>=2.0.0->fastapi->vllm==0.3.2)
  Downloading idna-3.7-py3-none-any.whl.metadata (9.9 kB)
Collecting typer>=0.12.3 (from fastapi-cli>=0.0.2->fastapi->vllm==0.3.2)
  Downloading typer-0.12.3-py3-none-any.whl.metadata (15 kB)
Collecting anyio (from httpx>=0.23.0->fastapi->vllm==0.3.2)
  Downloading anyio-4.4.0-py3-none-any.whl.metadata (4.6 kB)
Collecting certifi (from httpx>=0.23.0->fastapi->vllm==0.3.2)
  Downloading certifi-2024.6.2-py3-none-any.whl.metadata (2.2 kB)
Collecting httpcore==1.* (from httpx>=0.23.0->fastapi->vllm==0.3.2)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting sniffio (from httpx>=0.23.0->fastapi->vllm==0.3.2)
  Downloading sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting MarkupSafe>=2.0 (from jinja2->torch==2.1.2->vllm==0.3.2)
  Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting attrs>=22.2.0 (from jsonschema->ray>=2.9->vllm==0.3.2)
  Downloading attrs-23.2.0-py3-none-any.whl.metadata (9.5 kB)
Collecting jsonschema-specifications>=2023.03.6 (from jsonschema->ray>=2.9->vllm==0.3.2)
  Downloading jsonschema_specifications-2023.12.1-py3-none-any.whl.metadata (3.0 kB)
Collecting referencing>=0.28.4 (from jsonschema->ray>=2.9->vllm==0.3.2)
  Downloading referencing-0.35.1-py3-none-any.whl.metadata (2.8 kB)
Collecting rpds-py>=0.7.1 (from jsonschema->ray>=2.9->vllm==0.3.2)
  Downloading rpds_py-0.18.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Collecting charset-normalizer<4,>=2 (from requests->ray>=2.9->vllm==0.3.2)
  Downloading charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (33 kB)
Collecting urllib3<3,>=1.21.1 (from requests->ray>=2.9->vllm==0.3.2)
  Downloading urllib3-2.2.1-py3-none-any.whl.metadata (6.4 kB)
Collecting mpmath<1.4.0,>=1.1.0 (from sympy->torch==2.1.2->vllm==0.3.2)
  Downloading mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Collecting exceptiongroup>=1.0.2 (from anyio->httpx>=0.23.0->fastapi->vllm==0.3.2)
  Downloading exceptiongroup-1.2.1-py3-none-any.whl.metadata (6.6 kB)
Collecting shellingham>=1.3.0 (from typer>=0.12.3->fastapi-cli>=0.0.2->fastapi->vllm==0.3.2)
  Downloading shellingham-1.5.4-py2.py3-none-any.whl.metadata (3.5 kB)
Collecting rich>=10.11.0 (from typer>=0.12.3->fastapi-cli>=0.0.2->fastapi->vllm==0.3.2)
  Downloading rich-13.7.1-py3-none-any.whl.metadata (18 kB)
Collecting markdown-it-py>=2.2.0 (from rich>=10.11.0->typer>=0.12.3->fastapi-cli>=0.0.2->fastapi->vllm==0.3.2)
  Downloading markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting pygments<3.0.0,>=2.13.0 (from rich>=10.11.0->typer>=0.12.3->fastapi-cli>=0.0.2->fastapi->vllm==0.3.2)
  Downloading pygments-2.18.0-py3-none-any.whl.metadata (2.5 kB)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich>=10.11.0->typer>=0.12.3->fastapi-cli>=0.0.2->fastapi->vllm==0.3.2)
  Downloading mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
Downloading vllm-0.3.2-cp310-cp310-manylinux1_x86_64.whl (41.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.4/41.4 MB 16.2 MB/s eta 0:00:00
Downloading cupy_cuda12x-12.1.0-cp310-cp310-manylinux2014_x86_64.whl (83.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 83.0/83.0 MB 8.2 MB/s eta 0:00:00
Downloading pynvml-11.5.0-py3-none-any.whl (53 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.1/53.1 kB 3.7 MB/s eta 0:00:00
Downloading torch-2.1.2-cp310-cp310-manylinux1_x86_64.whl (670.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 670.2/670.2 MB 1.4 MB/s eta 0:00:00
Downloading triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.2/89.2 MB 7.9 MB/s eta 0:00:00
Downloading xformers-0.0.23.post1-cp310-cp310-manylinux2014_x86_64.whl (213.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 213.0/213.0 MB 4.1 MB/s eta 0:00:00
Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 2.3 MB/s eta 0:00:00
Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 10.0 MB/s eta 0:00:00
Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 4.9 MB/s eta 0:00:00
Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 60.0 MB/s eta 0:00:00
Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 943.0 kB/s eta 0:00:00
Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 1.1 MB/s eta 0:00:00
Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 2.4 MB/s eta 0:00:00
Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 4.9 MB/s eta 0:00:00
Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 1.1 MB/s eta 0:00:00
Downloading nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl (209.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.8/209.8 MB 4.2 MB/s eta 0:00:00
Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 8.1 MB/s eta 0:00:00
Downloading numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 39.6 MB/s eta 0:00:00
Downloading pydantic-2.7.3-py3-none-any.whl (409 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 409.6/409.6 kB 36.3 MB/s eta 0:00:00
Downloading pydantic_core-2.18.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 69.9 MB/s eta 0:00:00
Downloading ray-2.24.0-cp310-cp310-manylinux2014_x86_64.whl (65.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.9/65.9 MB 10.0 MB/s eta 0:00:00
Downloading transformers-4.41.2-py3-none-any.whl (9.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.1/9.1 MB 109.7 MB/s eta 0:00:00
Downloading fastapi-0.111.0-py3-none-any.whl (91 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 92.0/92.0 kB 9.8 MB/s eta 0:00:00
Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.2/307.2 kB 34.0 MB/s eta 0:00:00
Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 288.2/288.2 kB 26.0 MB/s eta 0:00:00
Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 58.0 MB/s eta 0:00:00
Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB)
Downloading click-8.1.7-py3-none-any.whl (97 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.9/97.9 kB 10.3 MB/s eta 0:00:00
Downloading email_validator-2.1.1-py3-none-any.whl (30 kB)
Downloading fastapi_cli-0.0.4-py3-none-any.whl (9.5 kB)
Downloading fastrlock-0.8.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (51 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51.3/51.3 kB 4.4 MB/s eta 0:00:00
Downloading h11-0.14.0-py3-none-any.whl (58 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 5.9 MB/s eta 0:00:00
Downloading httptools-0.6.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (341 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 341.4/341.4 kB 33.6 MB/s eta 0:00:00
Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.6/75.6 kB 8.2 MB/s eta 0:00:00
Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.9/77.9 kB 9.0 MB/s eta 0:00:00
Downloading huggingface_hub-0.23.3-py3-none-any.whl (401 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 401.7/401.7 kB 37.1 MB/s eta 0:00:00
Downloading fsspec-2024.6.0-py3-none-any.whl (176 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.9/176.9 kB 19.4 MB/s eta 0:00:00
Downloading jinja2-3.1.4-py3-none-any.whl (133 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 kB 14.1 MB/s eta 0:00:00
Downloading msgpack-1.0.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (385 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 385.1/385.1 kB 34.0 MB/s eta 0:00:00
Downloading orjson-3.10.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 142.5/142.5 kB 14.9 MB/s eta 0:00:00
Downloading packaging-24.0-py3-none-any.whl (53 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.5/53.5 kB 5.3 MB/s eta 0:00:00
Downloading protobuf-5.27.1-cp38-abi3-manylinux2014_x86_64.whl (309 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 309.2/309.2 kB 32.3 MB/s eta 0:00:00
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Downloading python_multipart-0.0.9-py3-none-any.whl (22 kB)
Downloading PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (705 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 705.5/705.5 kB 25.2 MB/s eta 0:00:00
Downloading regex-2024.5.15-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (775 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 775.1/775.1 kB 58.5 MB/s eta 0:00:00
Downloading safetensors-0.4.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 85.8 MB/s eta 0:00:00
Downloading starlette-0.37.2-py3-none-any.whl (71 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.9/71.9 kB 7.5 MB/s eta 0:00:00
Downloading tokenizers-0.19.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 120.6 MB/s eta 0:00:00
Downloading tqdm-4.66.4-py3-none-any.whl (78 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.3/78.3 kB 7.3 MB/s eta 0:00:00
Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Downloading ujson-5.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (53 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.6/53.6 kB 5.6 MB/s eta 0:00:00
Downloading uvicorn-0.30.1-py3-none-any.whl (62 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.4/62.4 kB 6.7 MB/s eta 0:00:00
Downloading uvloop-0.19.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 117.1 MB/s eta 0:00:00
Downloading watchfiles-0.22.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 87.2 MB/s eta 0:00:00
Downloading websockets-12.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (130 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 130.2/130.2 kB 12.7 MB/s eta 0:00:00
Downloading aioprometheus-23.12.0-py3-none-any.whl (31 kB)
Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Downloading frozenlist-1.4.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (239 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 239.5/239.5 kB 26.1 MB/s eta 0:00:00
Downloading filelock-3.14.0-py3-none-any.whl (12 kB)
Downloading jsonschema-4.22.0-py3-none-any.whl (88 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.3/88.3 kB 9.6 MB/s eta 0:00:00
Downloading networkx-3.3-py3-none-any.whl (1.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 66.2 MB/s eta 0:00:00
Downloading requests-2.32.3-py3-none-any.whl (64 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 7.1 MB/s eta 0:00:00
Downloading sympy-1.12.1-py3-none-any.whl (5.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 108.7 MB/s eta 0:00:00
Downloading anyio-4.4.0-py3-none-any.whl (86 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.8/86.8 kB 8.5 MB/s eta 0:00:00
Downloading attrs-23.2.0-py3-none-any.whl (60 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.8/60.8 kB 6.8 MB/s eta 0:00:00
Downloading certifi-2024.6.2-py3-none-any.whl (164 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 164.4/164.4 kB 17.5 MB/s eta 0:00:00
Downloading charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 142.1/142.1 kB 15.0 MB/s eta 0:00:00
Downloading dnspython-2.6.1-py3-none-any.whl (307 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.7/307.7 kB 30.3 MB/s eta 0:00:00
Downloading idna-3.7-py3-none-any.whl (66 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.8/66.8 kB 6.3 MB/s eta 0:00:00
Downloading jsonschema_specifications-2023.12.1-py3-none-any.whl (18 kB)
Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 49.7 MB/s eta 0:00:00
Downloading referencing-0.35.1-py3-none-any.whl (26 kB)
Downloading rpds_py-0.18.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 80.7 MB/s eta 0:00:00
Downloading sniffio-1.3.1-py3-none-any.whl (10 kB)
Downloading typer-0.12.3-py3-none-any.whl (47 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.2/47.2 kB 4.8 MB/s eta 0:00:00
Downloading urllib3-2.2.1-py3-none-any.whl (121 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.1/121.1 kB 12.6 MB/s eta 0:00:00
Downloading nvidia_nvjitlink_cu12-12.5.40-py3-none-manylinux2014_x86_64.whl (21.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.3/21.3 MB 29.4 MB/s eta 0:00:00
Downloading exceptiongroup-1.2.1-py3-none-any.whl (16 kB)
Downloading rich-13.7.1-py3-none-any.whl (240 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 240.7/240.7 kB 24.6 MB/s eta 0:00:00
Downloading shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB)
Downloading markdown_it_py-3.0.0-py3-none-any.whl (87 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87.5/87.5 kB 9.1 MB/s eta 0:00:00
Downloading pygments-2.18.0-py3-none-any.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 83.0 MB/s eta 0:00:00
Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Building wheels for collected packages: quantile-python
  Building wheel for quantile-python (setup.py): started
  Building wheel for quantile-python (setup.py): finished with status 'done'
  Created wheel for quantile-python: filename=quantile_python-1.1-py3-none-any.whl size=3443 sha256=8709fab3c63a2c2ac773179e20d77a3cf88eec45ddd05b8555fceb93a5c07052
  Stored in directory: /home/azureuser/.cache/pip/wheels/6d/f4/0a/0e7d01548a005f9f3fa23101f071d248da052f2a9bf2fe11c6
Successfully built quantile-python
Installing collected packages: sentencepiece, quantile-python, ninja, mpmath, fastrlock, websockets, uvloop, urllib3, ujson, typing-extensions, tqdm, sympy, sniffio, shellingham, safetensors, rpds-py, regex, pyyaml, python-multipart, python-dotenv, pynvml, pygments, psutil, protobuf, packaging, orjson, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, msgpack, mdurl, MarkupSafe, idna, httptools, h11, fsspec, frozenlist, filelock, exceptiongroup, dnspython, click, charset-normalizer, certifi, attrs, annotated-types, uvicorn, triton, requests, referencing, pydantic-core, nvidia-cusparse-cu12, nvidia-cudnn-cu12, markdown-it-py, jinja2, httpcore, email_validator, cupy-cuda12x, anyio, aiosignal, aioprometheus, watchfiles, starlette, rich, pydantic, nvidia-cusolver-cu12, jsonschema-specifications, huggingface-hub, httpx, typer, torch, tokenizers, jsonschema, xformers, transformers, ray, fastapi-cli, fastapi, vllm
Successfully installed MarkupSafe-2.1.5 aioprometheus-23.12.0 aiosignal-1.3.1 annotated-types-0.7.0 anyio-4.4.0 attrs-23.2.0 certifi-2024.6.2 charset-normalizer-3.3.2 click-8.1.7 cupy-cuda12x-12.1.0 dnspython-2.6.1 email_validator-2.1.1 exceptiongroup-1.2.1 fastapi-0.111.0 fastapi-cli-0.0.4 fastrlock-0.8.2 filelock-3.14.0 frozenlist-1.4.1 fsspec-2024.6.0 h11-0.14.0 httpcore-1.0.5 httptools-0.6.1 httpx-0.27.0 huggingface-hub-0.23.3 idna-3.7 jinja2-3.1.4 jsonschema-4.22.0 jsonschema-specifications-2023.12.1 markdown-it-py-3.0.0 mdurl-0.1.2 mpmath-1.3.0 msgpack-1.0.8 networkx-3.3 ninja-1.11.1.1 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.5.40 nvidia-nvtx-cu12-12.1.105 orjson-3.10.3 packaging-24.0 protobuf-5.27.1 psutil-5.9.8 pydantic-2.7.3 pydantic-core-2.18.4 pygments-2.18.0 pynvml-11.5.0 python-dotenv-1.0.1 python-multipart-0.0.9 pyyaml-6.0.1 quantile-python-1.1 ray-2.24.0 referencing-0.35.1 regex-2024.5.15 requests-2.32.3 rich-13.7.1 rpds-py-0.18.1 safetensors-0.4.3 sentencepiece-0.2.0 shellingham-1.5.4 sniffio-1.3.1 starlette-0.37.2 sympy-1.12.1 tokenizers-0.19.1 torch-2.1.2 tqdm-4.66.4 transformers-4.41.2 triton-2.1.0 typer-0.12.3 typing-extensions-4.12.2 ujson-5.10.0 urllib3-2.2.1 uvicorn-0.30.1 uvloop-0.19.0 vllm-0.3.2 watchfiles-0.22.0 websockets-12.0 xformers-0.0.23.post1
Collecting transformers==4.38.0
  Downloading transformers-4.38.0-py3-none-any.whl.metadata (131 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 131.1/131.1 kB 86.5 kB/s eta 0:00:00
Requirement already satisfied: filelock in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (3.14.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.19.3 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (0.23.3)
Requirement already satisfied: numpy>=1.17 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (24.0)
Requirement already satisfied: pyyaml>=5.1 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (6.0.1)
Requirement already satisfied: regex!=2019.12.17 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (2024.5.15)
Requirement already satisfied: requests in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (2.32.3)
Collecting tokenizers<0.19,>=0.14 (from transformers==4.38.0)
  Downloading tokenizers-0.15.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Requirement already satisfied: safetensors>=0.4.1 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (0.4.3)
Requirement already satisfied: tqdm>=4.27 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from transformers==4.38.0) (4.66.4)
Requirement already satisfied: fsspec>=2023.5.0 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.19.3->transformers==4.38.0) (2024.6.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.19.3->transformers==4.38.0) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from requests->transformers==4.38.0) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from requests->transformers==4.38.0) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from requests->transformers==4.38.0) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages (from requests->transformers==4.38.0) (2024.6.2)
Downloading transformers-4.38.0-py3-none-any.whl (8.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.5/8.5 MB 71.0 MB/s eta 0:00:00
Downloading tokenizers-0.15.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 95.9 MB/s eta 0:00:00
Installing collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.19.1
    Uninstalling tokenizers-0.19.1:
      Successfully uninstalled tokenizers-0.19.1
  Attempting uninstall: transformers
    Found existing installation: transformers 4.41.2
    Uninstalling transformers-4.41.2:
      Successfully uninstalled transformers-4.41.2
Successfully installed tokenizers-0.15.2 transformers-4.38.0
[skypilot.yaml] Done installing packages.
I 06-07 17:10:28 cloud_vm_ray_backend.py:3228] Setup completed.
I 06-07 17:10:28 cloud_vm_ray_backend.py:3414] Multiple resources are specified for the task, using: Azure({'A10': 1}, disk_tier=best, ports=['8000'])
I 06-07 17:10:31 cloud_vm_ray_backend.py:3315] Job submitted with Job ID: 1
I 06-08 00:10:31 log_lib.py:412] Start streaming logs for job 1.
INFO: Tip: use Ctrl-C to exit log streaming (task will not be killed).
INFO: Waiting for task resources on 1 node. This will block if the cluster is full.
INFO: All task resources reserved.
INFO: Reserved IPs: ['<redacted>']
(task, pid=11482) [skypilot.yaml] Listing available conda environments:
(task, pid=11482) # conda environments:
(task, pid=11482) #
(task, pid=11482) base                  *  /home/azureuser/miniconda3
(task, pid=11482) qwen                     /home/azureuser/miniconda3/envs/qwen
(task, pid=11482) 
(task, pid=11482) [skypilot.yaml] Activating conda environment 'qwen'
(task, pid=11482) [skypilot.yaml] Listing available conda environments:
(task, pid=11482) # conda environments:
(task, pid=11482) #
(task, pid=11482) base                     /home/azureuser/miniconda3
(task, pid=11482) qwen                  *  /home/azureuser/miniconda3/envs/qwen
(task, pid=11482) 
(task, pid=11482) [skypilot.yaml] Listing installed packages:
(task, pid=11482) Package                   Version
(task, pid=11482) ------------------------- ------------
(task, pid=11482) aioprometheus             23.12.0
(task, pid=11482) aiosignal                 1.3.1
(task, pid=11482) annotated-types           0.7.0
(task, pid=11482) anyio                     4.4.0
(task, pid=11482) attrs                     23.2.0
(task, pid=11482) certifi                   2024.6.2
(task, pid=11482) charset-normalizer        3.3.2
(task, pid=11482) click                     8.1.7
(task, pid=11482) cupy-cuda12x              12.1.0
(task, pid=11482) dnspython                 2.6.1
(task, pid=11482) email_validator           2.1.1
(task, pid=11482) exceptiongroup            1.2.1
(task, pid=11482) fastapi                   0.111.0
(task, pid=11482) fastapi-cli               0.0.4
(task, pid=11482) fastrlock                 0.8.2
(task, pid=11482) filelock                  3.14.0
(task, pid=11482) frozenlist                1.4.1
(task, pid=11482) fsspec                    2024.6.0
(task, pid=11482) h11                       0.14.0
(task, pid=11482) httpcore                  1.0.5
(task, pid=11482) httptools                 0.6.1
(task, pid=11482) httpx                     0.27.0
(task, pid=11482) huggingface-hub           0.23.3
(task, pid=11482) idna                      3.7
(task, pid=11482) Jinja2                    3.1.4
(task, pid=11482) jsonschema                4.22.0
(task, pid=11482) jsonschema-specifications 2023.12.1
(task, pid=11482) markdown-it-py            3.0.0
(task, pid=11482) MarkupSafe                2.1.5
(task, pid=11482) mdurl                     0.1.2
(task, pid=11482) mpmath                    1.3.0
(task, pid=11482) msgpack                   1.0.8
(task, pid=11482) networkx                  3.3
(task, pid=11482) ninja                     1.11.1.1
(task, pid=11482) numpy                     1.26.4
(task, pid=11482) nvidia-cublas-cu12        12.1.3.1
(task, pid=11482) nvidia-cuda-cupti-cu12    12.1.105
(task, pid=11482) nvidia-cuda-nvrtc-cu12    12.1.105
(task, pid=11482) nvidia-cuda-runtime-cu12  12.1.105
(task, pid=11482) nvidia-cudnn-cu12         8.9.2.26
(task, pid=11482) nvidia-cufft-cu12         11.0.2.54
(task, pid=11482) nvidia-curand-cu12        10.3.2.106
(task, pid=11482) nvidia-cusolver-cu12      11.4.5.107
(task, pid=11482) nvidia-cusparse-cu12      12.1.0.106
(task, pid=11482) nvidia-nccl-cu12          2.18.1
(task, pid=11482) nvidia-nvjitlink-cu12     12.5.40
(task, pid=11482) nvidia-nvtx-cu12          12.1.105
(task, pid=11482) orjson                    3.10.3
(task, pid=11482) packaging                 24.0
(task, pid=11482) pip                       24.0
(task, pid=11482) protobuf                  5.27.1
(task, pid=11482) psutil                    5.9.8
(task, pid=11482) pydantic                  2.7.3
(task, pid=11482) pydantic_core             2.18.4
(task, pid=11482) Pygments                  2.18.0
(task, pid=11482) pynvml                    11.5.0
(task, pid=11482) python-dotenv             1.0.1
(task, pid=11482) python-multipart          0.0.9
(task, pid=11482) PyYAML                    6.0.1
(task, pid=11482) quantile-python           1.1
(task, pid=11482) ray                       2.24.0
(task, pid=11482) referencing               0.35.1
(task, pid=11482) regex                     2024.5.15
(task, pid=11482) requests                  2.32.3
(task, pid=11482) rich                      13.7.1
(task, pid=11482) rpds-py                   0.18.1
(task, pid=11482) safetensors               0.4.3
(task, pid=11482) sentencepiece             0.2.0
(task, pid=11482) setuptools                69.5.1
(task, pid=11482) shellingham               1.5.4
(task, pid=11482) sniffio                   1.3.1
(task, pid=11482) starlette                 0.37.2
(task, pid=11482) sympy                     1.12.1
(task, pid=11482) tokenizers                0.15.2
(task, pid=11482) torch                     2.1.2
(task, pid=11482) tqdm                      4.66.4
(task, pid=11482) transformers              4.38.0
(task, pid=11482) triton                    2.1.0
(task, pid=11482) typer                     0.12.3
(task, pid=11482) typing_extensions         4.12.2
(task, pid=11482) ujson                     5.10.0
(task, pid=11482) urllib3                   2.2.1
(task, pid=11482) uvicorn                   0.30.1
(task, pid=11482) uvloop                    0.19.0
(task, pid=11482) vllm                      0.3.2
(task, pid=11482) watchfiles                0.22.0
(task, pid=11482) websockets                12.0
(task, pid=11482) wheel                     0.43.0
(task, pid=11482) xformers                  0.0.23.post1
(task, pid=11482) [skypilot.yaml] Setting PATH to include /sbin
(task, pid=11482) [skypilot.yaml] Starting vllm OpenAI API server with the following configuration:
(task, pid=11482) [skypilot.yaml]   - Host: 0.0.0.0
(task, pid=11482) [skypilot.yaml]   - Model: Qwen/Qwen1.5-7B-Chat
(task, pid=11482) [skypilot.yaml]   - Tensor Parallel Size: 1
(task, pid=11482) [skypilot.yaml]   - Maximum Model Length: 1024
(task, pid=11482) INFO 06-08 00:10:36 api_server.py:229] args: Namespace(host='0.0.0.0', port=8000, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, root_path=None, middleware=[], model='Qwen/Qwen1.5-7B-Chat', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', max_model_len=1024, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, block_size=16, seed=0, swap_space=4, gpu_memory_utilization=0.9, max_num_batched_tokens=None, max_num_seqs=256, max_paddings=256, disable_log_stats=False, quantization=None, enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='cuda', engine_use_ray=False, disable_log_requests=False, max_log_len=None)
(task, pid=11482) INFO 06-08 00:10:36 llm_engine.py:79] Initializing an LLM engine with config: model='Qwen/Qwen1.5-7B-Chat', tokenizer='Qwen/Qwen1.5-7B-Chat', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
(task, pid=11482) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
(task, pid=11482) Traceback (most recent call last):
(task, pid=11482)   File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/runpy.py", line 196, in _run_module_as_main
(task, pid=11482)     return _run_code(code, main_globals, None,
(task, pid=11482)   File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/runpy.py", line 86, in _run_code
(task, pid=11482)     exec(code, run_globals)
(task, pid=11482)   File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 237, in <module>
(task, pid=11482)     engine = AsyncLLMEngine.from_engine_args(engine_args)
(task, pid=11482)   File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 625, in from_engine_args
(task, pid=11482)     engine = cls(parallel_config.worker_use_ray,
(task, pid=11482)   File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 321, in __init__
(task, pid=11482)     self.engine = self._init_engine(*args, **kwargs)
(task, pid=11482)   File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 366, in _init_engine
(task, pid=11482)     return engine_class(*args, **kwargs)
(task, pid=11482)   File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 120, in __init__
(task, pid=11482)     self._init_workers()
(task, pid=11482)   File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 163, in _init_workers
(task, pid=11482)     self._run_workers("init_model")
(task, pid=11482)   File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1014, in _run_workers
(task, pid=11482)     driver_worker_output = getattr(self.driver_worker,
(task, pid=11482)   File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in init_model
(task, pid=11482)     torch.cuda.set_device(self.device)
(task, pid=11482)   File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/torch/cuda/__init__.py", line 404, in set_device
(task, pid=11482)     torch._C._cuda_setDevice(device)
(task, pid=11482)   File "/home/azureuser/miniconda3/envs/qwen/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
(task, pid=11482)     torch._C._cuda_init()
(task, pid=11482) RuntimeError: No CUDA GPUs are available
INFO: Job finished (status: SUCCEEDED).
I 06-07 17:10:43 cloud_vm_ray_backend.py:3350] Job ID: 1
I 06-07 17:10:43 cloud_vm_ray_backend.py:3350] To cancel the job:       sky cancel qwen 1
I 06-07 17:10:43 cloud_vm_ray_backend.py:3350] To stream job logs:      sky logs qwen 1
I 06-07 17:10:43 cloud_vm_ray_backend.py:3350] To view the job queue:   sky queue qwen
I 06-07 17:10:43 cloud_vm_ray_backend.py:3446] 
I 06-07 17:10:43 cloud_vm_ray_backend.py:3446] Cluster name: qwen
I 06-07 17:10:43 cloud_vm_ray_backend.py:3446] To log into the head VM: ssh qwen
I 06-07 17:10:43 cloud_vm_ray_backend.py:3446] To submit a job:         sky exec qwen yaml_file
I 06-07 17:10:43 cloud_vm_ray_backend.py:3446] To stop the cluster:     sky stop qwen
I 06-07 17:10:43 cloud_vm_ray_backend.py:3446] To teardown the cluster: sky down qwen
Clusters
NAME  LAUNCHED  RESOURCES                                                                  STATUS  AUTOSTOP  COMMAND                       
qwen  1 hr ago  1x Azure(Standard_NV6ads_A10_v5, {'A10': 1}, disk_tier=best, ports=['8...  UP      -         sky launch -c qwen x.yaml...  
Michaelvll commented 5 months ago

Hmm, good catch! Does this problem also happen for other GPUs types like A100, or is it an issue with A10 only?

WesleyYue commented 5 months ago

I tested on Standard_NC24ads_A100_v4 and Standard_NV6ads_A10_v5 but it happens on A10 only