vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
31.04k stars 4.72k forks source link

[Bug]: Unable to build image from `vllm` repo Dockerfile #6916

Open tadamcz opened 4 months ago

tadamcz commented 4 months ago

Your current environment

Not applicable -- Dockerfile.

🐛 Describe the bug

Steps to reproduce:

❯ docker build . --target vllm-base
[+] Building 4.4s (23/51)                                                                                              docker:desktop-linux
 => [internal] load .dockerignore                                                                                                      0.0s
 => => transferring context: 50B                                                                                                       0.0s
 => [internal] load build definition from Dockerfile                                                                                   0.0s
 => => transferring dockerfile: 8.97kB                                                                                                 0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:12.4.1-base-ubuntu20.04                                                         0.8s
 => [internal] load metadata for docker.io/nvidia/cuda:12.4.1-devel-ubuntu20.04                                                        0.9s
 => [auth] nvidia/cuda:pull token for registry-1.docker.io                                                                             0.0s
 => [internal] load build context                                                                                                      0.0s
 => => transferring context: 31.49kB                                                                                                   0.0s
 => [base  1/13] FROM docker.io/nvidia/cuda:12.4.1-devel-ubuntu20.04@sha256:8d577fd078ae56c37493af4454a5b700c72a7f1aeb9ff3d0adc0459fc  0.0s
 => [vllm-base  1/10] FROM docker.io/nvidia/cuda:12.4.1-base-ubuntu20.04@sha256:6fdb33fd81a5e214cfff44685aa32e3ab085c4ac506c2bd987c74  0.0s
 => CACHED [vllm-base  2/10] WORKDIR /vllm-workspace                                                                                   0.0s
 => CACHED [vllm-base  3/10] RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections     && echo 'tzdata tzdata/Zones/  0.0s
 => CACHED [vllm-base  4/10] RUN apt-get update -y     && apt-get install -y python3-pip git vim curl libibverbs-dev                   0.0s
 => CACHED [vllm-base  5/10] RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10                                            0.0s
 => CACHED [vllm-base  6/10] RUN python3 -m pip --version                                                                              0.0s
 => CACHED [vllm-base  7/10] RUN ldconfig /usr/local/cuda-$(echo 12.4.1 | cut -d. -f1,2)/compat/                                       0.0s
 => CACHED [base  2/13] RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections     && echo 'tzdata tzdata/Zones/Ameri  0.0s
 => CACHED [base  3/13] RUN apt-get update -y     && apt-get install -y git curl sudo                                                  0.0s
 => CACHED [base  4/13] RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10                                                 0.0s
 => CACHED [base  5/13] RUN python3 -m pip --version                                                                                   0.0s
 => CACHED [base  6/13] RUN ldconfig /usr/local/cuda-$(echo 12.4.1 | cut -d. -f1,2)/compat/                                            0.0s
 => CACHED [base  7/13] WORKDIR /workspace                                                                                             0.0s
 => CACHED [base  8/13] COPY requirements-common.txt requirements-common.txt                                                           0.0s
 => CACHED [base  9/13] COPY requirements-cuda.txt requirements-cuda.txt                                                               0.0s
 => ERROR [base 10/13] RUN --mount=type=cache,target=/root/.cache/pip     python3 -m pip install -r requirements-cuda.txt              3.5s
------
 > [base 10/13] RUN --mount=type=cache,target=/root/.cache/pip     python3 -m pip install -r requirements-cuda.txt:
0.455 Collecting cmake>=3.21 (from -r requirements-common.txt (line 1))
0.456   Using cached cmake-3.30.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (6.1 kB)
0.484 Collecting ninja (from -r requirements-common.txt (line 2))
0.485   Using cached ninja-1.11.1.1-py2.py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.metadata (5.3 kB)
0.547 Collecting psutil (from -r requirements-common.txt (line 3))
0.548   Using cached psutil-6.0.0-cp36-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (21 kB)
0.592 Collecting sentencepiece (from -r requirements-common.txt (line 4))
0.593   Using cached sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (7.7 kB)
0.720 Collecting numpy<2.0.0 (from -r requirements-common.txt (line 5))
0.721   Using cached numpy-1.26.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (62 kB)
0.724 Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from -r requirements-common.txt (line 6)) (2.22.0)
0.770 Collecting tqdm (from -r requirements-common.txt (line 7))
0.771   Using cached tqdm-4.66.4-py3-none-any.whl.metadata (57 kB)
0.787 Collecting py-cpuinfo (from -r requirements-common.txt (line 8))
0.788   Using cached py_cpuinfo-9.0.0-py3-none-any.whl.metadata (794 bytes)
0.825 Collecting transformers>=4.43.2 (from -r requirements-common.txt (line 9))
0.826   Using cached transformers-4.43.3-py3-none-any.whl.metadata (43 kB)
1.094 Collecting tokenizers>=0.19.1 (from -r requirements-common.txt (line 10))
1.095   Using cached tokenizers-0.19.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (6.7 kB)
1.141 Collecting fastapi (from -r requirements-common.txt (line 11))
1.142   Using cached fastapi-0.111.1-py3-none-any.whl.metadata (26 kB)
1.316 Collecting aiohttp (from -r requirements-common.txt (line 12))
1.317   Using cached aiohttp-3.9.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (7.5 kB)
1.378 Collecting openai (from -r requirements-common.txt (line 13))
1.379   Using cached openai-1.37.1-py3-none-any.whl.metadata (22 kB)
1.463 Collecting pydantic>=2.0 (from -r requirements-common.txt (line 15))
1.464   Using cached pydantic-2.8.2-py3-none-any.whl.metadata (125 kB)
1.595 Collecting pillow (from -r requirements-common.txt (line 16))
1.596   Using cached pillow-10.4.0-cp310-cp310-manylinux_2_28_aarch64.whl.metadata (9.2 kB)
1.617 Collecting prometheus_client>=0.18.0 (from -r requirements-common.txt (line 17))
1.618   Using cached prometheus_client-0.20.0-py3-none-any.whl.metadata (1.8 kB)
1.639 Collecting prometheus-fastapi-instrumentator>=7.0.0 (from -r requirements-common.txt (line 18))
1.640   Using cached prometheus_fastapi_instrumentator-7.0.0-py3-none-any.whl.metadata (13 kB)
1.669 Collecting tiktoken>=0.6.0 (from -r requirements-common.txt (line 19))
1.670   Using cached tiktoken-0.7.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (6.6 kB)
1.696 Collecting lm-format-enforcer==0.10.3 (from -r requirements-common.txt (line 20))
1.697   Using cached lm_format_enforcer-0.10.3-py3-none-any.whl.metadata (16 kB)
1.719 Collecting outlines<0.1,>=0.0.43 (from -r requirements-common.txt (line 21))
1.720   Using cached outlines-0.0.46-py3-none-any.whl.metadata (15 kB)
1.741 Collecting typing_extensions (from -r requirements-common.txt (line 22))
1.742   Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
1.763 Collecting filelock>=3.10.4 (from -r requirements-common.txt (line 23))
1.765   Using cached filelock-3.15.4-py3-none-any.whl.metadata (2.9 kB)
1.946 Collecting pyzmq (from -r requirements-common.txt (line 24))
1.947   Using cached pyzmq-26.0.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (6.1 kB)
2.015 Collecting ray>=2.9 (from -r requirements-cuda.txt (line 5))
2.016   Using cached ray-2.33.0-cp310-cp310-manylinux2014_aarch64.whl.metadata (13 kB)
2.058 Collecting nvidia-ml-py (from -r requirements-cuda.txt (line 6))
2.058   Using cached nvidia_ml_py-12.555.43-py3-none-any.whl.metadata (8.6 kB)
2.090 Collecting torch==2.3.1 (from -r requirements-cuda.txt (line 7))
2.091   Using cached torch-2.3.1-cp310-cp310-manylinux2014_aarch64.whl.metadata (26 kB)
2.133 Collecting torchvision==0.18.1 (from -r requirements-cuda.txt (line 9))
2.134   Using cached torchvision-0.18.1-cp310-cp310-manylinux2014_aarch64.whl.metadata (6.6 kB)
2.237 Collecting xformers==0.0.27 (from -r requirements-cuda.txt (line 10))
2.238   Using cached xformers-0.0.27.tar.gz (4.4 MB)
3.035   Preparing metadata (setup.py): started
3.126   Preparing metadata (setup.py): finished with status 'error'
3.128   error: subprocess-exited-with-error
3.128
3.128   × python setup.py egg_info did not run successfully.
3.128   │ exit code: 1
3.128   ╰─> [6 lines of output]
3.128       Traceback (most recent call last):
3.128         File "<string>", line 2, in <module>
3.128         File "<pip-setuptools-caller>", line 34, in <module>
3.128         File "/tmp/pip-install-4k3lbg51/xformers_64f2d3dd67514545bfce503117198306/setup.py", line 24, in <module>
3.128           import torch
3.128       ModuleNotFoundError: No module named 'torch'
3.128       [end of output]
3.128
3.128   note: This error originates from a subprocess, and is likely not a problem with pip.
3.135 error: metadata-generation-failed
3.135
3.135 × Encountered error while generating package metadata.
3.135 ╰─> See above for output.
3.135
3.135 note: This is an issue with the package mentioned above, not pip.
3.135 hint: See above for details.
------
Dockerfile:46
--------------------
  45 |     COPY requirements-cuda.txt requirements-cuda.txt
  46 | >>> RUN --mount=type=cache,target=/root/.cache/pip \
  47 | >>>     python3 -m pip install -r requirements-cuda.txt
  48 |
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 -m pip install -r requirements-cuda.txt" did not complete successfully: exit code: 1
tadamcz commented 4 months ago

A naive workaround of adding RUN pip install torch to the line before RUN --mount=type=cache,target=/root/.cache/pip \ python3 -m pip install -r requirements-cuda.txt leads to another error:

❯ docker build . --target vllm-base
[+] Building 23.9s (23/51)                                                                                                                           docker:desktop-linux
 => [internal] load .dockerignore                                                                                                                                    0.0s
 => => transferring context: 50B                                                                                                                                     0.0s
 => [internal] load build definition from Dockerfile                                                                                                                 0.0s
 => => transferring dockerfile: 9.00kB                                                                                                                               0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:12.4.1-base-ubuntu20.04                                                                                       0.5s
 => [internal] load metadata for docker.io/nvidia/cuda:12.4.1-devel-ubuntu20.04                                                                                      0.5s
 => [base  1/14] FROM docker.io/nvidia/cuda:12.4.1-devel-ubuntu20.04@sha256:8d577fd078ae56c37493af4454a5b700c72a7f1aeb9ff3d0adc0459fc47752a4                         0.0s
 => [vllm-base  1/10] FROM docker.io/nvidia/cuda:12.4.1-base-ubuntu20.04@sha256:6fdb33fd81a5e214cfff44685aa32e3ab085c4ac506c2bd987c743b0221591d0                     0.0s
 => [internal] load build context                                                                                                                                    0.0s
 => => transferring context: 31.49kB                                                                                                                                 0.0s
 => CACHED [vllm-base  2/10] WORKDIR /vllm-workspace                                                                                                                 0.0s
 => CACHED [vllm-base  3/10] RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections     && echo 'tzdata tzdata/Zones/America select Los_Angeles' |   0.0s
 => CACHED [vllm-base  4/10] RUN apt-get update -y     && apt-get install -y python3-pip git vim curl libibverbs-dev                                                 0.0s
 => CACHED [vllm-base  5/10] RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10                                                                          0.0s
 => CACHED [vllm-base  6/10] RUN python3 -m pip --version                                                                                                            0.0s
 => CACHED [vllm-base  7/10] RUN ldconfig /usr/local/cuda-$(echo 12.4.1 | cut -d. -f1,2)/compat/                                                                     0.0s
 => CACHED [base  2/14] RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections     && echo 'tzdata tzdata/Zones/America select Los_Angeles' | debco  0.0s
 => CACHED [base  3/14] RUN apt-get update -y     && apt-get install -y git curl sudo                                                                                0.0s
 => CACHED [base  4/14] RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10                                                                               0.0s
 => CACHED [base  5/14] RUN python3 -m pip --version                                                                                                                 0.0s
 => CACHED [base  6/14] RUN ldconfig /usr/local/cuda-$(echo 12.4.1 | cut -d. -f1,2)/compat/                                                                          0.0s
 => CACHED [base  7/14] WORKDIR /workspace                                                                                                                           0.0s
 => CACHED [base  8/14] COPY requirements-common.txt requirements-common.txt                                                                                         0.0s
 => CACHED [base  9/14] COPY requirements-cuda.txt requirements-cuda.txt                                                                                             0.0s
 => [base 10/14] RUN pip install torch                                                                                                                              18.5s
 => ERROR [base 11/14] RUN --mount=type=cache,target=/root/.cache/pip     python3 -m pip install -r requirements-cuda.txt                                            4.7s
------
 > [base 11/14] RUN --mount=type=cache,target=/root/.cache/pip     python3 -m pip install -r requirements-cuda.txt:
0.471 Collecting cmake>=3.21 (from -r requirements-common.txt (line 1))
0.473   Using cached cmake-3.30.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (6.1 kB)
0.500 Collecting ninja (from -r requirements-common.txt (line 2))
0.501   Using cached ninja-1.11.1.1-py2.py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.metadata (5.3 kB)
0.562 Collecting psutil (from -r requirements-common.txt (line 3))
0.563   Using cached psutil-6.0.0-cp36-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (21 kB)
0.611 Collecting sentencepiece (from -r requirements-common.txt (line 4))
0.612   Using cached sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (7.7 kB)
0.739 Collecting numpy<2.0.0 (from -r requirements-common.txt (line 5))
0.740   Using cached numpy-1.26.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (62 kB)
0.742 Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from -r requirements-common.txt (line 6)) (2.22.0)
0.788 Collecting tqdm (from -r requirements-common.txt (line 7))
0.789   Using cached tqdm-4.66.4-py3-none-any.whl.metadata (57 kB)
0.806 Collecting py-cpuinfo (from -r requirements-common.txt (line 8))
0.806   Using cached py_cpuinfo-9.0.0-py3-none-any.whl.metadata (794 bytes)
0.842 Collecting transformers>=4.43.2 (from -r requirements-common.txt (line 9))
0.843   Using cached transformers-4.43.3-py3-none-any.whl.metadata (43 kB)
1.108 Collecting tokenizers>=0.19.1 (from -r requirements-common.txt (line 10))
1.109   Using cached tokenizers-0.19.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (6.7 kB)
1.158 Collecting fastapi (from -r requirements-common.txt (line 11))
1.159   Using cached fastapi-0.111.1-py3-none-any.whl.metadata (26 kB)
1.341 Collecting aiohttp (from -r requirements-common.txt (line 12))
1.342   Using cached aiohttp-3.9.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (7.5 kB)
1.405 Collecting openai (from -r requirements-common.txt (line 13))
1.407   Using cached openai-1.37.1-py3-none-any.whl.metadata (22 kB)
1.496 Collecting pydantic>=2.0 (from -r requirements-common.txt (line 15))
1.497   Using cached pydantic-2.8.2-py3-none-any.whl.metadata (125 kB)
1.629 Collecting pillow (from -r requirements-common.txt (line 16))
1.630   Using cached pillow-10.4.0-cp310-cp310-manylinux_2_28_aarch64.whl.metadata (9.2 kB)
1.651 Collecting prometheus_client>=0.18.0 (from -r requirements-common.txt (line 17))
1.652   Using cached prometheus_client-0.20.0-py3-none-any.whl.metadata (1.8 kB)
1.672 Collecting prometheus-fastapi-instrumentator>=7.0.0 (from -r requirements-common.txt (line 18))
1.672   Using cached prometheus_fastapi_instrumentator-7.0.0-py3-none-any.whl.metadata (13 kB)
1.701 Collecting tiktoken>=0.6.0 (from -r requirements-common.txt (line 19))
1.702   Using cached tiktoken-0.7.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (6.6 kB)
1.727 Collecting lm-format-enforcer==0.10.3 (from -r requirements-common.txt (line 20))
1.728   Using cached lm_format_enforcer-0.10.3-py3-none-any.whl.metadata (16 kB)
1.748 Collecting outlines<0.1,>=0.0.43 (from -r requirements-common.txt (line 21))
1.749   Using cached outlines-0.0.46-py3-none-any.whl.metadata (15 kB)
1.752 Requirement already satisfied: typing_extensions in /usr/local/lib/python3.10/dist-packages (from -r requirements-common.txt (line 22)) (4.12.2)
1.753 Requirement already satisfied: filelock>=3.10.4 in /usr/local/lib/python3.10/dist-packages (from -r requirements-common.txt (line 23)) (3.15.4)
1.936 Collecting pyzmq (from -r requirements-common.txt (line 24))
1.937   Using cached pyzmq-26.0.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (6.1 kB)
2.012 Collecting ray>=2.9 (from -r requirements-cuda.txt (line 5))
2.013   Using cached ray-2.33.0-cp310-cp310-manylinux2014_aarch64.whl.metadata (13 kB)
2.050 Collecting nvidia-ml-py (from -r requirements-cuda.txt (line 6))
2.051   Using cached nvidia_ml_py-12.555.43-py3-none-any.whl.metadata (8.6 kB)
2.087 Collecting torch==2.3.1 (from -r requirements-cuda.txt (line 7))
2.088   Using cached torch-2.3.1-cp310-cp310-manylinux2014_aarch64.whl.metadata (26 kB)
2.129 Collecting torchvision==0.18.1 (from -r requirements-cuda.txt (line 9))
2.130   Using cached torchvision-0.18.1-cp310-cp310-manylinux2014_aarch64.whl.metadata (6.6 kB)
2.152 Collecting xformers==0.0.27 (from -r requirements-cuda.txt (line 10))
2.153   Using cached xformers-0.0.27.tar.gz (4.4 MB)
2.965   Preparing metadata (setup.py): started
4.327   Preparing metadata (setup.py): finished with status 'done'
4.359 ERROR: Could not find a version that satisfies the requirement vllm-flash-attn==2.5.9.post1 (from versions: none)
4.365 ERROR: No matching distribution found for vllm-flash-attn==2.5.9.post1
------
Dockerfile:47
--------------------
  46 |     RUN pip install torch
  47 | >>> RUN --mount=type=cache,target=/root/.cache/pip \
  48 | >>>     python3 -m pip install -r requirements-cuda.txt
  49 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 -m pip install -r requirements-cuda.txt" did not complete successfully: exit code: 1

I'm not sure how to make sense of this error.

I am reaching the limit of my knowledge of Python packaging, but to speculate:

tadamcz commented 4 months ago

After setting the target platform to linux/amd64, installing requirements-cuda.txt succeeds, but I get yet another different error. This time it's failing in the actual setup.py of vllm:

❯ docker buildx build . --target vllm-base --platform linux/amd64
[+] Building 1998.1s (47/52)                                                                                                docker-container:predexp_dependencies_builder
 => [internal] booting buildkit                                                                                                                                      9.5s
 => => pulling image moby/buildkit:buildx-stable-1                                                                                                                   8.8s
 => => creating container buildx_buildkit_predexp_dependencies_builder0                                                                                              0.8s
 => [internal] load build definition from Dockerfile                                                                                                                 0.0s
 => => transferring dockerfile: 8.98kB                                                                                                                               0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 133)                                                                                     0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 143)                                                                                     0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:12.4.1-base-ubuntu20.04                                                                                       1.4s
 => [internal] load metadata for docker.io/nvidia/cuda:12.4.1-devel-ubuntu20.04                                                                                      1.4s
 => [auth] nvidia/cuda:pull token for registry-1.docker.io                                                                                                           0.0s
 => [internal] load .dockerignore                                                                                                                                    0.0s
 => => transferring context: 50B                                                                                                                                     0.0s
 => [internal] load build context                                                                                                                                    0.1s
 => => transferring context: 4.76MB                                                                                                                                  0.1s
 => [base  1/13] FROM docker.io/nvidia/cuda:12.4.1-devel-ubuntu20.04@sha256:8d577fd078ae56c37493af4454a5b700c72a7f1aeb9ff3d0adc0459fc47752a4                       315.4s
 => => resolve docker.io/nvidia/cuda:12.4.1-devel-ubuntu20.04@sha256:8d577fd078ae56c37493af4454a5b700c72a7f1aeb9ff3d0adc0459fc47752a4                                0.0s
 => => sha256:6223811417458a3c93b84ee3b65f8b08d9e2828b926f0aed863041610d7d95d4 86.55kB / 86.55kB                                                                     0.3s
 => => sha256:7c373e2d9b7e82a6878d4a31293dd857915a0fe47d07dce541cea03b043d57fc 2.63GB / 2.63GB                                                                     283.8s
 => => sha256:1f4e68d7b5e4224ba1da78ef461ff7f01e8d59c09d39281277521384105a9441 1.52kB / 1.52kB                                                                       0.2s
 => => sha256:4829486be7c30f19f4136fa56adbb3de206ed0bbf0705b59fb2147406778ce38 1.69kB / 1.69kB                                                                       0.2s
 => => sha256:71bdb1a72c2d6dc97bbdbca82383f0260c4ee87556701e8e606c08a6bb0f0da5 62.64kB / 62.64kB                                                                     0.3s
 => => sha256:30c0ea6140d07e2a8deb70d780f277c63cf61836ff33d66eef944728a4bef6bd 1.37GB / 1.37GB                                                                     138.5s
 => => extracting sha256:30c0ea6140d07e2a8deb70d780f277c63cf61836ff33d66eef944728a4bef6bd                                                                           13.5s
 => => extracting sha256:71bdb1a72c2d6dc97bbdbca82383f0260c4ee87556701e8e606c08a6bb0f0da5                                                                            0.0s
 => => extracting sha256:4829486be7c30f19f4136fa56adbb3de206ed0bbf0705b59fb2147406778ce38                                                                            0.0s
 => => extracting sha256:1f4e68d7b5e4224ba1da78ef461ff7f01e8d59c09d39281277521384105a9441                                                                            0.0s
 => => extracting sha256:7c373e2d9b7e82a6878d4a31293dd857915a0fe47d07dce541cea03b043d57fc                                                                           30.9s
 => => extracting sha256:6223811417458a3c93b84ee3b65f8b08d9e2828b926f0aed863041610d7d95d4                                                                            0.0s
 => [vllm-base  1/10] FROM docker.io/nvidia/cuda:12.4.1-base-ubuntu20.04@sha256:6fdb33fd81a5e214cfff44685aa32e3ab085c4ac506c2bd987c743b0221591d0                    16.3s
 => => resolve docker.io/nvidia/cuda:12.4.1-base-ubuntu20.04@sha256:6fdb33fd81a5e214cfff44685aa32e3ab085c4ac506c2bd987c743b0221591d0                                 0.0s
 => => sha256:56dc8550293751a1604e97ac949cfae82ba20cb2a28e034737bafd7382559609 6.89kB / 6.89kB                                                                       0.1s
 => => sha256:db6cdef1932a0d9ca6ef9a539e08d491f66d1b1ed81926ae1525375bdd8100cc 185B / 185B                                                                           0.4s
 => => sha256:c7232af9ae05f7de83f8d6171bd0c35a4dd0a85ebafb15b950dbc08f89ea5fb5 57.59MB / 57.59MB                                                                    15.5s
 => => sha256:fbcd35dc5bc3a7bda41926aadd083020f942b001ebac6f1d30480f0f065394c0 7.94MB / 7.94MB                                                                       2.9s
 => => sha256:43cfb69dbb464ebad014cd4687bf02ee4f5011d540916c658af36faafbfd3481 27.51MB / 27.51MB                                                                     4.1s
 => => extracting sha256:43cfb69dbb464ebad014cd4687bf02ee4f5011d540916c658af36faafbfd3481                                                                            0.5s
 => => extracting sha256:fbcd35dc5bc3a7bda41926aadd083020f942b001ebac6f1d30480f0f065394c0                                                                            0.1s
 => => extracting sha256:c7232af9ae05f7de83f8d6171bd0c35a4dd0a85ebafb15b950dbc08f89ea5fb5                                                                            0.7s
 => => extracting sha256:db6cdef1932a0d9ca6ef9a539e08d491f66d1b1ed81926ae1525375bdd8100cc                                                                            0.0s
 => => extracting sha256:56dc8550293751a1604e97ac949cfae82ba20cb2a28e034737bafd7382559609                                                                            0.0s
 => [vllm-base  2/10] WORKDIR /vllm-workspace                                                                                                                        0.1s
 => [vllm-base  3/10] RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections     && echo 'tzdata tzdata/Zones/America select Los_Angeles' | debco  221.6s
 => [vllm-base  4/10] RUN apt-get update -y     && apt-get install -y python3-pip git vim curl libibverbs-dev                                                      103.8s
 => [base  2/13] RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections     && echo 'tzdata tzdata/Zones/America select Los_Angeles' | debconf-se  195.4s
 => [vllm-base  5/10] RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10                                                                                18.2s
 => [vllm-base  6/10] RUN python3 -m pip --version                                                                                                                   1.3s
 => [vllm-base  7/10] RUN ldconfig /usr/local/cuda-$(echo 12.4.1 | cut -d. -f1,2)/compat/                                                                            0.2s
 => [base  3/13] RUN apt-get update -y     && apt-get install -y git curl sudo                                                                                      41.1s
 => [base  4/13] RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10                                                                                     24.3s
 => [base  5/13] RUN python3 -m pip --version                                                                                                                        1.2s
 => [base  6/13] RUN ldconfig /usr/local/cuda-$(echo 12.4.1 | cut -d. -f1,2)/compat/                                                                                 0.2s
 => [base  7/13] WORKDIR /workspace                                                                                                                                  0.0s
 => [base  8/13] COPY requirements-common.txt requirements-common.txt                                                                                                0.0s
 => [base  9/13] COPY requirements-cuda.txt requirements-cuda.txt                                                                                                    0.0s
 => [base 10/13] RUN --mount=type=cache,target=/root/.cache/pip     python3 -m pip install -r requirements-cuda.txt                                                741.4s
 => [base 11/13] COPY requirements-mamba.txt requirements-mamba.txt                                                                                                  0.4s
 => [base 12/13] RUN python3 -m pip install packaging                                                                                                                6.9s
 => [base 13/13] RUN python3 -m pip install -r requirements-mamba.txt                                                                                              138.6s
 => [dev 1/4] COPY requirements-lint.txt requirements-lint.txt                                                                                                       0.0s
 => [build  1/15] COPY requirements-build.txt requirements-build.txt                                                                                                 0.0s
 => [dev 2/4] COPY requirements-test.txt requirements-test.txt                                                                                                       0.0s
 => [build  2/15] RUN --mount=type=cache,target=/root/.cache/pip     python3 -m pip install -r requirements-build.txt                                                6.9s
 => [dev 3/4] COPY requirements-dev.txt requirements-dev.txt                                                                                                         0.0s
 => [dev 4/4] RUN --mount=type=cache,target=/root/.cache/pip     python3 -m pip install -r requirements-dev.txt                                                    190.6s
 => [build  3/15] RUN apt-get update -y && apt-get install -y ccache                                                                                                28.2s
 => [build  4/15] COPY csrc csrc                                                                                                                                     0.0s
 => [build  5/15] COPY setup.py setup.py                                                                                                                             0.0s
 => [build  6/15] COPY cmake cmake                                                                                                                                   0.0s
 => [build  7/15] COPY CMakeLists.txt CMakeLists.txt                                                                                                                 0.0s
 => [build  8/15] COPY requirements-common.txt requirements-common.txt                                                                                               0.0s
 => [build  9/15] COPY requirements-cuda.txt requirements-cuda.txt                                                                                                   0.0s
 => [build 10/15] COPY pyproject.toml pyproject.toml                                                                                                                 0.0s
 => [build 11/15] COPY vllm vllm                                                                                                                                     0.1s
 => [build 12/15] RUN --mount=type=cache,target=/root/.cache/pip     if [ "$USE_SCCACHE" = "1" ]; then         echo "Installing sccache..."         && curl -L -o s  0.1s
 => ERROR [build 13/15] RUN --mount=type=cache,target=/root/.cache/ccache     --mount=type=cache,target=/root/.cache/pip     if [ "$USE_SCCACHE" != "1" ]; then    486.5s
 => [mamba-builder 1/3] WORKDIR /usr/src/mamba                                                                                                                       0.2s
 => [mamba-builder 2/3] COPY requirements-mamba.txt requirements-mamba.txt                                                                                           0.0s
 => [mamba-builder 3/3] RUN pip --verbose wheel -r requirements-mamba.txt     --no-build-isolation --no-deps --no-cache-dir                                        136.2s
------
 > [build 13/15] RUN --mount=type=cache,target=/root/.cache/ccache     --mount=type=cache,target=/root/.cache/pip     if [ "$USE_SCCACHE" != "1" ]; then         python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38;     fi:
13.12 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
13.54 running bdist_wheel
13.69 running build
13.69 running build_py
13.74 creating build
13.74 creating build/lib.linux-x86_64-cpython-310
13.74 creating build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/envs.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/sampling_params.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/block.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/scripts.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/tracing.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/utils.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/pooling_params.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/_ipex_ops.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/outputs.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/sequence.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 copying vllm/config.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 copying vllm/version.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 copying vllm/connections.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 copying vllm/logger.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 copying vllm/_custom_ops.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 copying vllm/commit_id.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 creating build/lib.linux-x86_64-cpython-310/vllm/triton_utils
13.75 copying vllm/triton_utils/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/triton_utils
13.75 copying vllm/triton_utils/custom_cache_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/triton_utils
13.75 creating build/lib.linux-x86_64-cpython-310/vllm/engine
13.75 copying vllm/engine/arg_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
13.75 copying vllm/engine/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
13.75 copying vllm/engine/llm_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
13.75 copying vllm/engine/async_llm_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
13.75 copying vllm/engine/async_timeout.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
13.76 copying vllm/engine/metrics.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
13.76 creating build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/xpu_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/worker_base.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/openvino_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/tpu_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/neuron_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/neuron_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/xpu_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/cpu_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/embedding_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.77 copying vllm/worker/cpu_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.77 copying vllm/worker/openvino_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.77 copying vllm/worker/model_runner_base.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.77 copying vllm/worker/tpu_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.77 copying vllm/worker/cache_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.77 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor
13.77 copying vllm/model_executor/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
13.77 copying vllm/model_executor/sampling_metadata.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
13.77 copying vllm/model_executor/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
13.77 copying vllm/model_executor/custom_op.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
13.77 copying vllm/model_executor/pooling_metadata.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
13.77 creating build/lib.linux-x86_64-cpython-310/vllm/core
13.77 copying vllm/core/block_manager_v1.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.77 copying vllm/core/evictor_v2.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/policy.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/evictor_v1.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/embedding_model_block_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/block_manager_v2.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/scheduler.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 creating build/lib.linux-x86_64-cpython-310/vllm/entrypoints
13.78 copying vllm/entrypoints/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
13.78 copying vllm/entrypoints/chat_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
13.78 copying vllm/entrypoints/llm.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
13.78 copying vllm/entrypoints/api_server.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
13.78 copying vllm/entrypoints/logger.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
13.78 creating build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 copying vllm/adapter_commons/request.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 copying vllm/adapter_commons/models.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 copying vllm/adapter_commons/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 copying vllm/adapter_commons/layers.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 copying vllm/adapter_commons/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 copying vllm/adapter_commons/worker_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 creating build/lib.linux-x86_64-cpython-310/vllm/assets
13.79 copying vllm/assets/image.py -> build/lib.linux-x86_64-cpython-310/vllm/assets
13.79 copying vllm/assets/base.py -> build/lib.linux-x86_64-cpython-310/vllm/assets
13.79 copying vllm/assets/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/assets
13.79 creating build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.79 copying vllm/spec_decode/spec_decode_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.79 copying vllm/spec_decode/smaller_tp_proposer_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.79 copying vllm/spec_decode/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.79 copying vllm/spec_decode/target_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/ngram_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/proposer_worker_base.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/batch_expansion.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/top1_proposer.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/mlp_speculator_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/util.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/metrics.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/medusa_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/multi_step_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/draft_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 creating build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter
13.80 copying vllm/prompt_adapter/request.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter
13.80 copying vllm/prompt_adapter/models.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter
13.80 copying vllm/prompt_adapter/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter
13.80 copying vllm/prompt_adapter/layers.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter
13.81 copying vllm/prompt_adapter/worker_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter
13.81 creating build/lib.linux-x86_64-cpython-310/vllm/logging
13.81 copying vllm/logging/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/logging
13.81 copying vllm/logging/formatter.py -> build/lib.linux-x86_64-cpython-310/vllm/logging
13.81 creating build/lib.linux-x86_64-cpython-310/vllm/usage
13.81 copying vllm/usage/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/usage
13.81 copying vllm/usage/usage_lib.py -> build/lib.linux-x86_64-cpython-310/vllm/usage
13.81 creating build/lib.linux-x86_64-cpython-310/vllm/distributed
13.81 copying vllm/distributed/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed
13.81 copying vllm/distributed/parallel_state.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed
13.81 copying vllm/distributed/communication_op.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed
13.82 copying vllm/distributed/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed
13.82 creating build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/ray_gpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/multiproc_gpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/cpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/distributed_gpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/executor_base.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/tpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/ray_tpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/gpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/xpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/openvino_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/neuron_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/multiproc_worker_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.83 copying vllm/executor/ray_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.83 copying vllm/executor/ray_xpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.83 creating build/lib.linux-x86_64-cpython-310/vllm/attention
13.83 copying vllm/attention/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/attention
13.83 copying vllm/attention/selector.py -> build/lib.linux-x86_64-cpython-310/vllm/attention
13.83 copying vllm/attention/layer.py -> build/lib.linux-x86_64-cpython-310/vllm/attention
13.83 creating build/lib.linux-x86_64-cpython-310/vllm/multimodal
13.83 copying vllm/multimodal/registry.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal
13.83 copying vllm/multimodal/image.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal
13.83 copying vllm/multimodal/base.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal
13.83 copying vllm/multimodal/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal
13.83 copying vllm/multimodal/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal
13.83 creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
13.83 copying vllm/transformers_utils/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
13.83 copying vllm/transformers_utils/image_processor.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
13.84 copying vllm/transformers_utils/detokenizer.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
13.84 copying vllm/transformers_utils/tokenizer.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
13.84 copying vllm/transformers_utils/config.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
13.84 creating build/lib.linux-x86_64-cpython-310/vllm/inputs
13.84 copying vllm/inputs/registry.py -> build/lib.linux-x86_64-cpython-310/vllm/inputs
13.84 copying vllm/inputs/data.py -> build/lib.linux-x86_64-cpython-310/vllm/inputs
13.84 copying vllm/inputs/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/inputs
13.84 creating build/lib.linux-x86_64-cpython-310/vllm/server
13.84 copying vllm/server/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/server
13.84 copying vllm/server/launch.py -> build/lib.linux-x86_64-cpython-310/vllm/server
13.84 creating build/lib.linux-x86_64-cpython-310/vllm/lora
13.84 copying vllm/lora/request.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.84 copying vllm/lora/models.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.84 copying vllm/lora/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.84 copying vllm/lora/punica.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.85 copying vllm/lora/layers.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.85 copying vllm/lora/fully_sharded_layers.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.85 copying vllm/lora/lora.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.85 copying vllm/lora/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.85 copying vllm/lora/worker_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.85 creating build/lib.linux-x86_64-cpython-310/vllm/platforms
13.85 copying vllm/platforms/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms
13.85 copying vllm/platforms/tpu.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms
13.85 copying vllm/platforms/rocm.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms
13.85 copying vllm/platforms/cuda.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms
13.85 copying vllm/platforms/interface.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms
13.85 creating build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.85 copying vllm/engine/output_processor/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.85 copying vllm/engine/output_processor/multi_step.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.85 copying vllm/engine/output_processor/stop_checker.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.86 copying vllm/engine/output_processor/util.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.86 copying vllm/engine/output_processor/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.86 copying vllm/engine/output_processor/single_step.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.86 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/weight_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/neuron.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/loader.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/tensorizer.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/openvino.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding
13.86 copying vllm/model_executor/guided_decoding/outlines_decoding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding
13.86 copying vllm/model_executor/guided_decoding/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding
13.87 copying vllm/model_executor/guided_decoding/lm_format_enforcer_decoding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding
13.87 copying vllm/model_executor/guided_decoding/outlines_logits_processors.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding
13.87 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/pooler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/vocab_parallel_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/linear.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/rejection_sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/activation.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/rotary_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/logits_processor.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/typical_acceptance_sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/layernorm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/spec_decode_base_sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.88 copying vllm/model_executor/layers/sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.88 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/llava.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/xverse.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/falcon.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/intern_vit.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/gpt2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/qwen2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/llama_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/qwen2_moe.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/commandr.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/mpt.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/blip2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/fuyu.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/gemma2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/mlp_speculator.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/arctic.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/chatglm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/opt.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/gpt_j.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/gpt_bigcode.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/llava_next.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/jamba.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/phi3_small.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/phi3v.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/paligemma.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/olmo.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/internlm2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/decilm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/medusa.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/bloom.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/qwen.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/mixtral_quant.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/persimmon.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/gemma.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/clip.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/llama.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/deepseek_v2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/deepseek.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/starcoder2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/mixtral.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/phi.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/chameleon.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/minicpm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/stablelm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/dbrx.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/minicpmv.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/internvl.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/nemotron.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/jais.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/orion.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/blip.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/baichuan.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.92 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/aqlm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/fbgemm_fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/base_config.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/squeezellm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/bitsandbytes.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/awq.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/deepspeedfp.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/gptq_marlin.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/marlin.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/kv_cache.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.93 copying vllm/model_executor/layers/quantization/gptq_marlin_24.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.93 copying vllm/model_executor/layers/quantization/gptq.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.93 copying vllm/model_executor/layers/quantization/awq_marlin.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.93 copying vllm/model_executor/layers/quantization/schema.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.93 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/ops
13.93 copying vllm/model_executor/layers/ops/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/ops
13.93 copying vllm/model_executor/layers/ops/rand.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/ops
13.93 copying vllm/model_executor/layers/ops/sample.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/ops
13.93 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe
13.93 copying vllm/model_executor/layers/fused_moe/moe_pallas.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe
13.93 copying vllm/model_executor/layers/fused_moe/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe
13.93 copying vllm/model_executor/layers/fused_moe/layer.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe
13.93 copying vllm/model_executor/layers/fused_moe/fused_moe.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe
13.94 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/marlin_utils_test_24.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/w8a8_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/quant_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/marlin_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/marlin_utils_test.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors
13.94 copying vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors
13.94 copying vllm/model_executor/layers/quantization/compressed_tensors/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors
13.94 copying vllm/model_executor/layers/quantization/compressed_tensors/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors
13.95 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_unquantized.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_scheme.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a16_fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_24.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 creating build/lib.linux-x86_64-cpython-310/vllm/core/block
13.95 copying vllm/core/block/block_table.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.95 copying vllm/core/block/prefix_caching_block.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 copying vllm/core/block/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 copying vllm/core/block/cpu_gpu_block_allocator.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 copying vllm/core/block/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 copying vllm/core/block/common.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 copying vllm/core/block/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 copying vllm/core/block/naive_block.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 creating build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.96 copying vllm/entrypoints/openai/serving_chat.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.96 copying vllm/entrypoints/openai/serving_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.96 copying vllm/entrypoints/openai/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.96 copying vllm/entrypoints/openai/protocol.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.96 copying vllm/entrypoints/openai/serving_completion.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.96 copying vllm/entrypoints/openai/serving_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.97 copying vllm/entrypoints/openai/run_batch.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.97 copying vllm/entrypoints/openai/serving_tokenization.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.97 copying vllm/entrypoints/openai/cli_args.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.97 copying vllm/entrypoints/openai/api_server.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.97 creating build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/pynccl.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/custom_all_reduce.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/tpu_communicator.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/shm_broadcast.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/cuda_wrapper.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/custom_all_reduce_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/pynccl_wrapper.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 creating build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/ipex_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/blocksparse_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/flash_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/rocm_flash_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/xformers.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/abstract.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/openvino.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/flashinfer.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/pallas.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/torch_sdpa.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 creating build/lib.linux-x86_64-cpython-310/vllm/attention/ops
13.98 copying vllm/attention/ops/ipex_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops
13.98 copying vllm/attention/ops/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops
13.99 copying vllm/attention/ops/triton_flash_attention.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops
13.99 copying vllm/attention/ops/paged_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops
13.99 copying vllm/attention/ops/prefix_prefill.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops
13.99 creating build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention
13.99 copying vllm/attention/ops/blocksparse_attention/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention
13.99 copying vllm/attention/ops/blocksparse_attention/interface.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention
13.99 copying vllm/attention/ops/blocksparse_attention/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention
13.99 copying vllm/attention/ops/blocksparse_attention/blocksparse_attention_kernel.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention
13.99 creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group
13.99 copying vllm/transformers_utils/tokenizer_group/base_tokenizer_group.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group
13.99 copying vllm/transformers_utils/tokenizer_group/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group
13.99 copying vllm/transformers_utils/tokenizer_group/ray_tokenizer_group.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group
13.99 copying vllm/transformers_utils/tokenizer_group/tokenizer_group.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group
14.00 creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers
14.00 copying vllm/transformers_utils/tokenizers/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers
14.00 copying vllm/transformers_utils/tokenizers/baichuan.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers
14.00 creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/falcon.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/mpt.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/mlp_speculator.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/arctic.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/chatglm.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/medusa.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/dbrx.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/internvl.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/nemotron.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.01 copying vllm/transformers_utils/configs/jais.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.03 copying vllm/py.typed -> build/lib.linux-x86_64-cpython-310/vllm
14.03 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.05 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.05 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.05 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.05 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.05 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.08 running build_ext
14.91 Using MAX_JOBS=2 as the number of jobs.
14.99 Using NVCC_THREADS=8 as the number of nvcc threads.
17.09 -- The CXX compiler identification is GNU 9.4.0
17.61 -- Detecting CXX compiler ABI info
20.34 -- Detecting CXX compiler ABI info - done
20.52 -- Check for working CXX compiler: /usr/bin/c++ - skipped
20.52 -- Detecting CXX compile features
20.53 -- Detecting CXX compile features - done
20.53 -- Build type: RelWithDebInfo
20.53 -- Target device: cuda
22.37 -- Found Python: /usr/bin/python3 (found version "3.10.14") found components: Interpreter Development.Module Development.SABIModule
22.38 -- Found python matching: /usr/bin/python3.
34.56 -- Found CUDA: /usr/local/cuda (found version "12.4") 
41.94 -- The CUDA compiler identification is NVIDIA 12.4.131
42.01 -- Detecting CUDA compiler ABI info
49.27 -- Detecting CUDA compiler ABI info - done
49.83 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
49.90 -- Detecting CUDA compile features
49.90 -- Detecting CUDA compile features - done
49.92 -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.4.131")
49.95 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
52.41 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
52.41 -- Looking for pthread_create in pthreads
54.68 -- Looking for pthread_create in pthreads - not found
54.68 -- Looking for pthread_create in pthread
57.05 -- Looking for pthread_create in pthread - found
57.06 -- Found Threads: TRUE
57.19 -- Caffe2: CUDA detected: 12.4
57.19 -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
57.19 -- Caffe2: CUDA toolkit directory: /usr/local/cuda
59.76 -- Caffe2: Header version is: 12.4
59.77 CMake Warning at /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/public/cuda.cmake:184 (message):
59.77   Failed to compute shorthash for libnvrtc.so
59.77 Call Stack (most recent call first):
59.77   /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:87 (include)
59.77   /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
59.77   CMakeLists.txt:67 (find_package)
59.77 
59.77 
59.77 -- USE_CUDNN is set to 0. Compiling without cuDNN support
59.77 -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
59.77 CMake Warning at /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/public/utils.cmake:385 (message):
59.77   In the future we will require one to explicitly pass TORCH_CUDA_ARCH_LIST
59.77   to cmake instead of implicitly setting it as an env variable.  This will
59.77   become a FATAL_ERROR in future version of pytorch.
59.77 Call Stack (most recent call first):
59.77   /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/public/cuda.cmake:342 (torch_cuda_get_nvcc_gencode_flag)
59.77   /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:87 (include)
59.77   /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
59.77   CMakeLists.txt:67 (find_package)
59.77 
59.77 
59.78 -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90,code=compute_90
59.81 CMake Warning at /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
59.81   static library kineto_LIBRARY-NOTFOUND not found.
59.81 Call Stack (most recent call first):
59.81   /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
59.81   CMakeLists.txt:67 (find_package)
59.81 
59.81 
59.82 -- Found Torch: /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch.so
59.82 -- CUDA supported arches: 7.0;7.5;8.0;8.6;8.9;9.0
59.83 -- CUDA target arches: 70-real;75-real;80-real;86-real;89-real;90-real;90-virtual
88.38 -- CMake Version: 3.30.1
88.38 -- CUTLASS 3.5.0
88.38 -- CUDART: /usr/local/cuda/lib64/libcudart.so
88.38 -- CUDA Driver: /usr/local/cuda/lib64/stubs/libcuda.so
88.38 -- NVRTC: /usr/local/cuda/lib64/libnvrtc.so
88.40 -- Default Install Location: install
90.20 -- Found Python3: /usr/bin/python3.10 (found suitable version "3.10.14", minimum required is "3.5") found components: Interpreter
90.22 -- CUDA Compilation Architectures: 70;72;75;80;86;87;89;90;90a
90.22 -- Enable caching of reference results in conv unit tests
90.22 -- Enable rigorous conv problem sizes in conv unit tests
90.23 -- Using NVCC flags: --expt-relaxed-constexpr;-DCUTLASS_TEST_LEVEL=0;-DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1;-DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1;-DCUTLASS_DEBUG_TRACE_LEVEL=0;-Xcompiler=-Wconversion;-Xcompiler=-fno-strict-aliasing;-lineinfo
90.36 fatal: not a git repository (or any of the parent directories): .git
90.36 -- CUTLASS Revision: Unable to detect, Git returned code 128.
90.39 -- Configuring cublas ...
90.39 -- cuBLAS Disabled.
90.39 -- Configuring cuBLAS ... done.
103.8 -- Completed generation of library instances. See /workspace/build/temp.linux-x86_64-cpython-310/_deps/cutlass-build/tools/library/library_instance_generation.log for more information.
111.6 -- Punica target arches: 80-real;86-real;89-real;90-real;90-virtual
111.6 -- Enabling C extension.
111.6 -- Enabling moe extension.
111.6 -- Enabling punica extension.
111.7 -- Configuring done (96.0s)
125.5 -- Generating done (13.8s)
125.5 -- Build files have been written to: /workspace/build/temp.linux-x86_64-cpython-310
125.8 Using MAX_JOBS=2 as the number of jobs.
125.9 Using NVCC_THREADS=8 as the number of nvcc threads.
127.5 [0/2] Re-checking globbed directories...
267.8 [1/38] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/torch_bindings.cpp.o
480.5 [2/38] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o
480.5 FAILED: CMakeFiles/_C.dir/csrc/cache_kernels.cu.o 
480.5 ccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/workspace/csrc -I/workspace/build/temp.linux-x86_64-cpython-310/_deps/cutlass-src/include -I/workspace/build/temp.linux-x86_64-cpython-310/_deps/cutlass-src/tools/util/include -isystem /usr/include/python3.10 -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_75,code=[sm_75]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_86,code=[sm_86]" "--generate-code=arch=compute_89,code=[sm_89]" "--generate-code=arch=compute_90,code=[sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=8 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/cache_kernels.cu.o -MF CMakeFiles/_C.dir/csrc/cache_kernels.cu.o.d -x cu -c /workspace/csrc/cache_kernels.cu -o CMakeFiles/_C.dir/csrc/cache_kernels.cu.o
480.5 <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
480.5 <jemalloc>: (This is the expected behaviour if you are running under QEMU)
480.5 Killed
480.5 <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
480.5 <jemalloc>: (This is the expected behaviour if you are running under QEMU)
480.5 Killed
480.5 <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
480.5 <jemalloc>: (This is the expected behaviour if you are running under QEMU)
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(97): warning #940-D: missing return statement at end of non-void function "vllm::bf1622float2"
480.5   }
480.5   ^
480.5 
480.5 Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(105): warning #940-D: missing return statement at end of non-void function "vllm::bf162bf162"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(118): warning #940-D: missing return statement at end of non-void function "vllm::add(__nv_bfloat16, __nv_bfloat16)"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(126): warning #940-D: missing return statement at end of non-void function "vllm::add(__nv_bfloat162, __nv_bfloat162)"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(173): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=__nv_bfloat16, A=__nv_bfloat16, B=__nv_bfloat16]"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(182): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=__nv_bfloat162, A=__nv_bfloat162, B=__nv_bfloat162]"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(292): warning #940-D: missing return statement at end of non-void function "vllm::fma(__nv_bfloat162, __nv_bfloat162, __nv_bfloat162)"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(301): warning #940-D: missing return statement at end of non-void function "vllm::fma(__nv_bfloat16, __nv_bfloat162, __nv_bfloat162)"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(478): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_vec_conversion<Tout,Tin>(const Tin &, float, __nv_fp8_interpretation_t) [with Tout=uint8_t, Tin=__nv_bfloat16]"
480.5   }
480.5   ^
480.5 
480.5 Killed
480.5 <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
480.5 <jemalloc>: (This is the expected behaviour if you are running under QEMU)
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 374 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 376 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 378 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 380 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 382 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 384 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
480.5 <jemalloc>: (This is the expected behaviour if you are running under QEMU)
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(97): warning #940-D: missing return statement at end of non-void function "vllm::bf1622float2"
480.5   }
480.5   ^
480.5 
480.5 Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(105): warning #940-D: missing return statement at end of non-void function "vllm::bf162bf162"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(118): warning #940-D: missing return statement at end of non-void function "vllm::add(__nv_bfloat16, __nv_bfloat16)"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(126): warning #940-D: missing return statement at end of non-void function "vllm::add(__nv_bfloat162, __nv_bfloat162)"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(173): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=__nv_bfloat16, A=__nv_bfloat16, B=__nv_bfloat16]"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(182): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=__nv_bfloat162, A=__nv_bfloat162, B=__nv_bfloat162]"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(292): warning #940-D: missing return statement at end of non-void function "vllm::fma(__nv_bfloat162, __nv_bfloat162, __nv_bfloat162)"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(301): warning #940-D: missing return statement at end of non-void function "vllm::fma(__nv_bfloat16, __nv_bfloat162, __nv_bfloat162)"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(478): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_vec_conversion<Tout,Tin>(const Tin &, float, __nv_fp8_interpretation_t) [with Tout=uint8_t, Tin=__nv_bfloat16]"
480.5   }
480.5   ^
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 374 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 376 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 378 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 380 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 382 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 384 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
480.5 <jemalloc>: (This is the expected behaviour if you are running under QEMU)
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 374 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 376 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 378 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 380 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 382 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5   }
480.5   ^
480.5           detected during:
480.5             instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5             instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 384 of /workspace/csrc/cache_kernels.cu
480.5 
480.5 ninja: build stopped: subcommand failed.
481.9 Traceback (most recent call last):
482.0   File "/workspace/setup.py", line 459, in <module>
482.1     setup(
482.1   File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 108, in setup
482.1     return distutils.core.setup(**attrs)
482.1   File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 184, in setup
482.1     return run_commands(dist)
482.1   File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 200, in run_commands
482.1     dist.run_commands()
482.1   File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 970, in run_commands
482.2     self.run_command(cmd)
482.2   File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 945, in run_command
482.2     super().run_command(command)
482.2   File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 989, in run_command
482.2     cmd_obj.run()
482.2   File "/usr/local/lib/python3.10/dist-packages/setuptools/command/bdist_wheel.py", line 373, in run
482.2     self.run_command("build")
482.2   File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 316, in run_command
482.2     self.distribution.run_command(command)
482.2   File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 945, in run_command
482.2     super().run_command(command)
482.2   File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 989, in run_command
482.2     cmd_obj.run()
482.2   File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build.py", line 135, in run
482.2     self.run_command(cmd_name)
482.2   File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 316, in run_command
482.2     self.distribution.run_command(command)
482.2   File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 945, in run_command
482.2     super().run_command(command)
482.2   File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 989, in run_command
482.2     cmd_obj.run()
482.2   File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 93, in run
482.2     _build_ext.run(self)
482.2   File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
482.2     self.build_extensions()
482.2   File "/workspace/setup.py", line 234, in build_extensions
482.2     subprocess.check_call(["cmake", *build_args], cwd=self.build_temp)
482.2   File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
482.2     raise CalledProcessError(retcode, cmd)
482.2 subprocess.CalledProcessError: Command '['cmake', '--build', '.', '-j=1', '--target=_moe_C', '--target=_C', '--target=_punica_C']' returned non-zero exit status 1.
------
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load

 2 warnings found (use --debug to expand):
 - FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 133)
 - FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 143)
Dockerfile:120
--------------------
 119 |     ENV CCACHE_DIR=/root/.cache/ccache
 120 | >>> RUN --mount=type=cache,target=/root/.cache/ccache \
 121 | >>>     --mount=type=cache,target=/root/.cache/pip \
 122 | >>>     if [ "$USE_SCCACHE" != "1" ]; then \
 123 | >>>         python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38; \
 124 | >>>     fi
 125 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c if [ \"$USE_SCCACHE\" != \"1\" ]; then         python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38;     fi" did not complete successfully: exit code: 1
drikster80 commented 4 months ago

What system are you trying to build this on? The first 2 posts suggest it's a ARM64 (aarch64) system...

tadamcz commented 4 months ago

Building on arm64, targeting amd64 (so the first two are not relevant for my precise use case, but probably should still not error out?). In any case you can see the build also fails for amd64.

bzr1 commented 2 months ago

i got the same issue

umid-podo commented 2 weeks ago

I am experiencing the same issue.

potaninmt commented 2 weeks ago

Same error Freezes after the message

[2/130] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o