Open tadamcz opened 4 months ago
A naive workaround of adding RUN pip install torch
to the line before RUN --mount=type=cache,target=/root/.cache/pip \ python3 -m pip install -r requirements-cuda.txt
leads to another error:
❯ docker build . --target vllm-base
[+] Building 23.9s (23/51) docker:desktop-linux
=> [internal] load .dockerignore 0.0s
=> => transferring context: 50B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 9.00kB 0.0s
=> [internal] load metadata for docker.io/nvidia/cuda:12.4.1-base-ubuntu20.04 0.5s
=> [internal] load metadata for docker.io/nvidia/cuda:12.4.1-devel-ubuntu20.04 0.5s
=> [base 1/14] FROM docker.io/nvidia/cuda:12.4.1-devel-ubuntu20.04@sha256:8d577fd078ae56c37493af4454a5b700c72a7f1aeb9ff3d0adc0459fc47752a4 0.0s
=> [vllm-base 1/10] FROM docker.io/nvidia/cuda:12.4.1-base-ubuntu20.04@sha256:6fdb33fd81a5e214cfff44685aa32e3ab085c4ac506c2bd987c743b0221591d0 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 31.49kB 0.0s
=> CACHED [vllm-base 2/10] WORKDIR /vllm-workspace 0.0s
=> CACHED [vllm-base 3/10] RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections && echo 'tzdata tzdata/Zones/America select Los_Angeles' | 0.0s
=> CACHED [vllm-base 4/10] RUN apt-get update -y && apt-get install -y python3-pip git vim curl libibverbs-dev 0.0s
=> CACHED [vllm-base 5/10] RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10 0.0s
=> CACHED [vllm-base 6/10] RUN python3 -m pip --version 0.0s
=> CACHED [vllm-base 7/10] RUN ldconfig /usr/local/cuda-$(echo 12.4.1 | cut -d. -f1,2)/compat/ 0.0s
=> CACHED [base 2/14] RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections && echo 'tzdata tzdata/Zones/America select Los_Angeles' | debco 0.0s
=> CACHED [base 3/14] RUN apt-get update -y && apt-get install -y git curl sudo 0.0s
=> CACHED [base 4/14] RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10 0.0s
=> CACHED [base 5/14] RUN python3 -m pip --version 0.0s
=> CACHED [base 6/14] RUN ldconfig /usr/local/cuda-$(echo 12.4.1 | cut -d. -f1,2)/compat/ 0.0s
=> CACHED [base 7/14] WORKDIR /workspace 0.0s
=> CACHED [base 8/14] COPY requirements-common.txt requirements-common.txt 0.0s
=> CACHED [base 9/14] COPY requirements-cuda.txt requirements-cuda.txt 0.0s
=> [base 10/14] RUN pip install torch 18.5s
=> ERROR [base 11/14] RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install -r requirements-cuda.txt 4.7s
------
> [base 11/14] RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install -r requirements-cuda.txt:
0.471 Collecting cmake>=3.21 (from -r requirements-common.txt (line 1))
0.473 Using cached cmake-3.30.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (6.1 kB)
0.500 Collecting ninja (from -r requirements-common.txt (line 2))
0.501 Using cached ninja-1.11.1.1-py2.py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.metadata (5.3 kB)
0.562 Collecting psutil (from -r requirements-common.txt (line 3))
0.563 Using cached psutil-6.0.0-cp36-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (21 kB)
0.611 Collecting sentencepiece (from -r requirements-common.txt (line 4))
0.612 Using cached sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (7.7 kB)
0.739 Collecting numpy<2.0.0 (from -r requirements-common.txt (line 5))
0.740 Using cached numpy-1.26.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (62 kB)
0.742 Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from -r requirements-common.txt (line 6)) (2.22.0)
0.788 Collecting tqdm (from -r requirements-common.txt (line 7))
0.789 Using cached tqdm-4.66.4-py3-none-any.whl.metadata (57 kB)
0.806 Collecting py-cpuinfo (from -r requirements-common.txt (line 8))
0.806 Using cached py_cpuinfo-9.0.0-py3-none-any.whl.metadata (794 bytes)
0.842 Collecting transformers>=4.43.2 (from -r requirements-common.txt (line 9))
0.843 Using cached transformers-4.43.3-py3-none-any.whl.metadata (43 kB)
1.108 Collecting tokenizers>=0.19.1 (from -r requirements-common.txt (line 10))
1.109 Using cached tokenizers-0.19.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (6.7 kB)
1.158 Collecting fastapi (from -r requirements-common.txt (line 11))
1.159 Using cached fastapi-0.111.1-py3-none-any.whl.metadata (26 kB)
1.341 Collecting aiohttp (from -r requirements-common.txt (line 12))
1.342 Using cached aiohttp-3.9.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (7.5 kB)
1.405 Collecting openai (from -r requirements-common.txt (line 13))
1.407 Using cached openai-1.37.1-py3-none-any.whl.metadata (22 kB)
1.496 Collecting pydantic>=2.0 (from -r requirements-common.txt (line 15))
1.497 Using cached pydantic-2.8.2-py3-none-any.whl.metadata (125 kB)
1.629 Collecting pillow (from -r requirements-common.txt (line 16))
1.630 Using cached pillow-10.4.0-cp310-cp310-manylinux_2_28_aarch64.whl.metadata (9.2 kB)
1.651 Collecting prometheus_client>=0.18.0 (from -r requirements-common.txt (line 17))
1.652 Using cached prometheus_client-0.20.0-py3-none-any.whl.metadata (1.8 kB)
1.672 Collecting prometheus-fastapi-instrumentator>=7.0.0 (from -r requirements-common.txt (line 18))
1.672 Using cached prometheus_fastapi_instrumentator-7.0.0-py3-none-any.whl.metadata (13 kB)
1.701 Collecting tiktoken>=0.6.0 (from -r requirements-common.txt (line 19))
1.702 Using cached tiktoken-0.7.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (6.6 kB)
1.727 Collecting lm-format-enforcer==0.10.3 (from -r requirements-common.txt (line 20))
1.728 Using cached lm_format_enforcer-0.10.3-py3-none-any.whl.metadata (16 kB)
1.748 Collecting outlines<0.1,>=0.0.43 (from -r requirements-common.txt (line 21))
1.749 Using cached outlines-0.0.46-py3-none-any.whl.metadata (15 kB)
1.752 Requirement already satisfied: typing_extensions in /usr/local/lib/python3.10/dist-packages (from -r requirements-common.txt (line 22)) (4.12.2)
1.753 Requirement already satisfied: filelock>=3.10.4 in /usr/local/lib/python3.10/dist-packages (from -r requirements-common.txt (line 23)) (3.15.4)
1.936 Collecting pyzmq (from -r requirements-common.txt (line 24))
1.937 Using cached pyzmq-26.0.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (6.1 kB)
2.012 Collecting ray>=2.9 (from -r requirements-cuda.txt (line 5))
2.013 Using cached ray-2.33.0-cp310-cp310-manylinux2014_aarch64.whl.metadata (13 kB)
2.050 Collecting nvidia-ml-py (from -r requirements-cuda.txt (line 6))
2.051 Using cached nvidia_ml_py-12.555.43-py3-none-any.whl.metadata (8.6 kB)
2.087 Collecting torch==2.3.1 (from -r requirements-cuda.txt (line 7))
2.088 Using cached torch-2.3.1-cp310-cp310-manylinux2014_aarch64.whl.metadata (26 kB)
2.129 Collecting torchvision==0.18.1 (from -r requirements-cuda.txt (line 9))
2.130 Using cached torchvision-0.18.1-cp310-cp310-manylinux2014_aarch64.whl.metadata (6.6 kB)
2.152 Collecting xformers==0.0.27 (from -r requirements-cuda.txt (line 10))
2.153 Using cached xformers-0.0.27.tar.gz (4.4 MB)
2.965 Preparing metadata (setup.py): started
4.327 Preparing metadata (setup.py): finished with status 'done'
4.359 ERROR: Could not find a version that satisfies the requirement vllm-flash-attn==2.5.9.post1 (from versions: none)
4.365 ERROR: No matching distribution found for vllm-flash-attn==2.5.9.post1
------
Dockerfile:47
--------------------
46 | RUN pip install torch
47 | >>> RUN --mount=type=cache,target=/root/.cache/pip \
48 | >>> python3 -m pip install -r requirements-cuda.txt
49 |
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 -m pip install -r requirements-cuda.txt" did not complete successfully: exit code: 1
I'm not sure how to make sense of this error.
I am reaching the limit of my knowledge of Python packaging, but to speculate:
from versions: none
despite the fact that the project exists on pypi (https://pypi.org/project/vllm-flash-attn/)vllm-flash-attn
contains neither compatible pre-built wheels, nor a source distribution? This is surprising to me -- shouldn't a source distribution should always be included?After setting the target platform to linux/amd64
, installing requirements-cuda.txt
succeeds, but I get yet another different error. This time it's failing in the actual setup.py
of vllm
:
❯ docker buildx build . --target vllm-base --platform linux/amd64
[+] Building 1998.1s (47/52) docker-container:predexp_dependencies_builder
=> [internal] booting buildkit 9.5s
=> => pulling image moby/buildkit:buildx-stable-1 8.8s
=> => creating container buildx_buildkit_predexp_dependencies_builder0 0.8s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 8.98kB 0.0s
=> WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 133) 0.0s
=> WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 143) 0.0s
=> [internal] load metadata for docker.io/nvidia/cuda:12.4.1-base-ubuntu20.04 1.4s
=> [internal] load metadata for docker.io/nvidia/cuda:12.4.1-devel-ubuntu20.04 1.4s
=> [auth] nvidia/cuda:pull token for registry-1.docker.io 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 50B 0.0s
=> [internal] load build context 0.1s
=> => transferring context: 4.76MB 0.1s
=> [base 1/13] FROM docker.io/nvidia/cuda:12.4.1-devel-ubuntu20.04@sha256:8d577fd078ae56c37493af4454a5b700c72a7f1aeb9ff3d0adc0459fc47752a4 315.4s
=> => resolve docker.io/nvidia/cuda:12.4.1-devel-ubuntu20.04@sha256:8d577fd078ae56c37493af4454a5b700c72a7f1aeb9ff3d0adc0459fc47752a4 0.0s
=> => sha256:6223811417458a3c93b84ee3b65f8b08d9e2828b926f0aed863041610d7d95d4 86.55kB / 86.55kB 0.3s
=> => sha256:7c373e2d9b7e82a6878d4a31293dd857915a0fe47d07dce541cea03b043d57fc 2.63GB / 2.63GB 283.8s
=> => sha256:1f4e68d7b5e4224ba1da78ef461ff7f01e8d59c09d39281277521384105a9441 1.52kB / 1.52kB 0.2s
=> => sha256:4829486be7c30f19f4136fa56adbb3de206ed0bbf0705b59fb2147406778ce38 1.69kB / 1.69kB 0.2s
=> => sha256:71bdb1a72c2d6dc97bbdbca82383f0260c4ee87556701e8e606c08a6bb0f0da5 62.64kB / 62.64kB 0.3s
=> => sha256:30c0ea6140d07e2a8deb70d780f277c63cf61836ff33d66eef944728a4bef6bd 1.37GB / 1.37GB 138.5s
=> => extracting sha256:30c0ea6140d07e2a8deb70d780f277c63cf61836ff33d66eef944728a4bef6bd 13.5s
=> => extracting sha256:71bdb1a72c2d6dc97bbdbca82383f0260c4ee87556701e8e606c08a6bb0f0da5 0.0s
=> => extracting sha256:4829486be7c30f19f4136fa56adbb3de206ed0bbf0705b59fb2147406778ce38 0.0s
=> => extracting sha256:1f4e68d7b5e4224ba1da78ef461ff7f01e8d59c09d39281277521384105a9441 0.0s
=> => extracting sha256:7c373e2d9b7e82a6878d4a31293dd857915a0fe47d07dce541cea03b043d57fc 30.9s
=> => extracting sha256:6223811417458a3c93b84ee3b65f8b08d9e2828b926f0aed863041610d7d95d4 0.0s
=> [vllm-base 1/10] FROM docker.io/nvidia/cuda:12.4.1-base-ubuntu20.04@sha256:6fdb33fd81a5e214cfff44685aa32e3ab085c4ac506c2bd987c743b0221591d0 16.3s
=> => resolve docker.io/nvidia/cuda:12.4.1-base-ubuntu20.04@sha256:6fdb33fd81a5e214cfff44685aa32e3ab085c4ac506c2bd987c743b0221591d0 0.0s
=> => sha256:56dc8550293751a1604e97ac949cfae82ba20cb2a28e034737bafd7382559609 6.89kB / 6.89kB 0.1s
=> => sha256:db6cdef1932a0d9ca6ef9a539e08d491f66d1b1ed81926ae1525375bdd8100cc 185B / 185B 0.4s
=> => sha256:c7232af9ae05f7de83f8d6171bd0c35a4dd0a85ebafb15b950dbc08f89ea5fb5 57.59MB / 57.59MB 15.5s
=> => sha256:fbcd35dc5bc3a7bda41926aadd083020f942b001ebac6f1d30480f0f065394c0 7.94MB / 7.94MB 2.9s
=> => sha256:43cfb69dbb464ebad014cd4687bf02ee4f5011d540916c658af36faafbfd3481 27.51MB / 27.51MB 4.1s
=> => extracting sha256:43cfb69dbb464ebad014cd4687bf02ee4f5011d540916c658af36faafbfd3481 0.5s
=> => extracting sha256:fbcd35dc5bc3a7bda41926aadd083020f942b001ebac6f1d30480f0f065394c0 0.1s
=> => extracting sha256:c7232af9ae05f7de83f8d6171bd0c35a4dd0a85ebafb15b950dbc08f89ea5fb5 0.7s
=> => extracting sha256:db6cdef1932a0d9ca6ef9a539e08d491f66d1b1ed81926ae1525375bdd8100cc 0.0s
=> => extracting sha256:56dc8550293751a1604e97ac949cfae82ba20cb2a28e034737bafd7382559609 0.0s
=> [vllm-base 2/10] WORKDIR /vllm-workspace 0.1s
=> [vllm-base 3/10] RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections && echo 'tzdata tzdata/Zones/America select Los_Angeles' | debco 221.6s
=> [vllm-base 4/10] RUN apt-get update -y && apt-get install -y python3-pip git vim curl libibverbs-dev 103.8s
=> [base 2/13] RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections && echo 'tzdata tzdata/Zones/America select Los_Angeles' | debconf-se 195.4s
=> [vllm-base 5/10] RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10 18.2s
=> [vllm-base 6/10] RUN python3 -m pip --version 1.3s
=> [vllm-base 7/10] RUN ldconfig /usr/local/cuda-$(echo 12.4.1 | cut -d. -f1,2)/compat/ 0.2s
=> [base 3/13] RUN apt-get update -y && apt-get install -y git curl sudo 41.1s
=> [base 4/13] RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10 24.3s
=> [base 5/13] RUN python3 -m pip --version 1.2s
=> [base 6/13] RUN ldconfig /usr/local/cuda-$(echo 12.4.1 | cut -d. -f1,2)/compat/ 0.2s
=> [base 7/13] WORKDIR /workspace 0.0s
=> [base 8/13] COPY requirements-common.txt requirements-common.txt 0.0s
=> [base 9/13] COPY requirements-cuda.txt requirements-cuda.txt 0.0s
=> [base 10/13] RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install -r requirements-cuda.txt 741.4s
=> [base 11/13] COPY requirements-mamba.txt requirements-mamba.txt 0.4s
=> [base 12/13] RUN python3 -m pip install packaging 6.9s
=> [base 13/13] RUN python3 -m pip install -r requirements-mamba.txt 138.6s
=> [dev 1/4] COPY requirements-lint.txt requirements-lint.txt 0.0s
=> [build 1/15] COPY requirements-build.txt requirements-build.txt 0.0s
=> [dev 2/4] COPY requirements-test.txt requirements-test.txt 0.0s
=> [build 2/15] RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install -r requirements-build.txt 6.9s
=> [dev 3/4] COPY requirements-dev.txt requirements-dev.txt 0.0s
=> [dev 4/4] RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install -r requirements-dev.txt 190.6s
=> [build 3/15] RUN apt-get update -y && apt-get install -y ccache 28.2s
=> [build 4/15] COPY csrc csrc 0.0s
=> [build 5/15] COPY setup.py setup.py 0.0s
=> [build 6/15] COPY cmake cmake 0.0s
=> [build 7/15] COPY CMakeLists.txt CMakeLists.txt 0.0s
=> [build 8/15] COPY requirements-common.txt requirements-common.txt 0.0s
=> [build 9/15] COPY requirements-cuda.txt requirements-cuda.txt 0.0s
=> [build 10/15] COPY pyproject.toml pyproject.toml 0.0s
=> [build 11/15] COPY vllm vllm 0.1s
=> [build 12/15] RUN --mount=type=cache,target=/root/.cache/pip if [ "$USE_SCCACHE" = "1" ]; then echo "Installing sccache..." && curl -L -o s 0.1s
=> ERROR [build 13/15] RUN --mount=type=cache,target=/root/.cache/ccache --mount=type=cache,target=/root/.cache/pip if [ "$USE_SCCACHE" != "1" ]; then 486.5s
=> [mamba-builder 1/3] WORKDIR /usr/src/mamba 0.2s
=> [mamba-builder 2/3] COPY requirements-mamba.txt requirements-mamba.txt 0.0s
=> [mamba-builder 3/3] RUN pip --verbose wheel -r requirements-mamba.txt --no-build-isolation --no-deps --no-cache-dir 136.2s
------
> [build 13/15] RUN --mount=type=cache,target=/root/.cache/ccache --mount=type=cache,target=/root/.cache/pip if [ "$USE_SCCACHE" != "1" ]; then python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38; fi:
13.12 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
13.54 running bdist_wheel
13.69 running build
13.69 running build_py
13.74 creating build
13.74 creating build/lib.linux-x86_64-cpython-310
13.74 creating build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/envs.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/sampling_params.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/block.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/scripts.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/tracing.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/utils.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/pooling_params.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/_ipex_ops.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/outputs.py -> build/lib.linux-x86_64-cpython-310/vllm
13.74 copying vllm/sequence.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 copying vllm/config.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 copying vllm/version.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 copying vllm/connections.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 copying vllm/logger.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 copying vllm/_custom_ops.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 copying vllm/commit_id.py -> build/lib.linux-x86_64-cpython-310/vllm
13.75 creating build/lib.linux-x86_64-cpython-310/vllm/triton_utils
13.75 copying vllm/triton_utils/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/triton_utils
13.75 copying vllm/triton_utils/custom_cache_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/triton_utils
13.75 creating build/lib.linux-x86_64-cpython-310/vllm/engine
13.75 copying vllm/engine/arg_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
13.75 copying vllm/engine/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
13.75 copying vllm/engine/llm_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
13.75 copying vllm/engine/async_llm_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
13.75 copying vllm/engine/async_timeout.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
13.76 copying vllm/engine/metrics.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
13.76 creating build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/xpu_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/worker_base.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/openvino_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/tpu_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/neuron_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/neuron_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/xpu_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/cpu_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.76 copying vllm/worker/embedding_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.77 copying vllm/worker/cpu_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.77 copying vllm/worker/openvino_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.77 copying vllm/worker/model_runner_base.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.77 copying vllm/worker/tpu_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.77 copying vllm/worker/cache_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
13.77 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor
13.77 copying vllm/model_executor/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
13.77 copying vllm/model_executor/sampling_metadata.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
13.77 copying vllm/model_executor/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
13.77 copying vllm/model_executor/custom_op.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
13.77 copying vllm/model_executor/pooling_metadata.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
13.77 creating build/lib.linux-x86_64-cpython-310/vllm/core
13.77 copying vllm/core/block_manager_v1.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.77 copying vllm/core/evictor_v2.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/policy.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/evictor_v1.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/embedding_model_block_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/block_manager_v2.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/scheduler.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 copying vllm/core/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/core
13.78 creating build/lib.linux-x86_64-cpython-310/vllm/entrypoints
13.78 copying vllm/entrypoints/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
13.78 copying vllm/entrypoints/chat_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
13.78 copying vllm/entrypoints/llm.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
13.78 copying vllm/entrypoints/api_server.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
13.78 copying vllm/entrypoints/logger.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
13.78 creating build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 copying vllm/adapter_commons/request.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 copying vllm/adapter_commons/models.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 copying vllm/adapter_commons/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 copying vllm/adapter_commons/layers.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 copying vllm/adapter_commons/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 copying vllm/adapter_commons/worker_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons
13.79 creating build/lib.linux-x86_64-cpython-310/vllm/assets
13.79 copying vllm/assets/image.py -> build/lib.linux-x86_64-cpython-310/vllm/assets
13.79 copying vllm/assets/base.py -> build/lib.linux-x86_64-cpython-310/vllm/assets
13.79 copying vllm/assets/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/assets
13.79 creating build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.79 copying vllm/spec_decode/spec_decode_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.79 copying vllm/spec_decode/smaller_tp_proposer_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.79 copying vllm/spec_decode/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.79 copying vllm/spec_decode/target_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/ngram_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/proposer_worker_base.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/batch_expansion.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/top1_proposer.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/mlp_speculator_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/util.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/metrics.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/medusa_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/multi_step_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/draft_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 copying vllm/spec_decode/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode
13.80 creating build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter
13.80 copying vllm/prompt_adapter/request.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter
13.80 copying vllm/prompt_adapter/models.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter
13.80 copying vllm/prompt_adapter/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter
13.80 copying vllm/prompt_adapter/layers.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter
13.81 copying vllm/prompt_adapter/worker_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter
13.81 creating build/lib.linux-x86_64-cpython-310/vllm/logging
13.81 copying vllm/logging/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/logging
13.81 copying vllm/logging/formatter.py -> build/lib.linux-x86_64-cpython-310/vllm/logging
13.81 creating build/lib.linux-x86_64-cpython-310/vllm/usage
13.81 copying vllm/usage/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/usage
13.81 copying vllm/usage/usage_lib.py -> build/lib.linux-x86_64-cpython-310/vllm/usage
13.81 creating build/lib.linux-x86_64-cpython-310/vllm/distributed
13.81 copying vllm/distributed/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed
13.81 copying vllm/distributed/parallel_state.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed
13.81 copying vllm/distributed/communication_op.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed
13.82 copying vllm/distributed/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed
13.82 creating build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/ray_gpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/multiproc_gpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/cpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/distributed_gpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/executor_base.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/tpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/ray_tpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/gpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/xpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/openvino_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/neuron_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.82 copying vllm/executor/multiproc_worker_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.83 copying vllm/executor/ray_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.83 copying vllm/executor/ray_xpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor
13.83 creating build/lib.linux-x86_64-cpython-310/vllm/attention
13.83 copying vllm/attention/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/attention
13.83 copying vllm/attention/selector.py -> build/lib.linux-x86_64-cpython-310/vllm/attention
13.83 copying vllm/attention/layer.py -> build/lib.linux-x86_64-cpython-310/vllm/attention
13.83 creating build/lib.linux-x86_64-cpython-310/vllm/multimodal
13.83 copying vllm/multimodal/registry.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal
13.83 copying vllm/multimodal/image.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal
13.83 copying vllm/multimodal/base.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal
13.83 copying vllm/multimodal/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal
13.83 copying vllm/multimodal/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal
13.83 creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
13.83 copying vllm/transformers_utils/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
13.83 copying vllm/transformers_utils/image_processor.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
13.84 copying vllm/transformers_utils/detokenizer.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
13.84 copying vllm/transformers_utils/tokenizer.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
13.84 copying vllm/transformers_utils/config.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
13.84 creating build/lib.linux-x86_64-cpython-310/vllm/inputs
13.84 copying vllm/inputs/registry.py -> build/lib.linux-x86_64-cpython-310/vllm/inputs
13.84 copying vllm/inputs/data.py -> build/lib.linux-x86_64-cpython-310/vllm/inputs
13.84 copying vllm/inputs/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/inputs
13.84 creating build/lib.linux-x86_64-cpython-310/vllm/server
13.84 copying vllm/server/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/server
13.84 copying vllm/server/launch.py -> build/lib.linux-x86_64-cpython-310/vllm/server
13.84 creating build/lib.linux-x86_64-cpython-310/vllm/lora
13.84 copying vllm/lora/request.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.84 copying vllm/lora/models.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.84 copying vllm/lora/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.84 copying vllm/lora/punica.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.85 copying vllm/lora/layers.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.85 copying vllm/lora/fully_sharded_layers.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.85 copying vllm/lora/lora.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.85 copying vllm/lora/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.85 copying vllm/lora/worker_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
13.85 creating build/lib.linux-x86_64-cpython-310/vllm/platforms
13.85 copying vllm/platforms/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms
13.85 copying vllm/platforms/tpu.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms
13.85 copying vllm/platforms/rocm.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms
13.85 copying vllm/platforms/cuda.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms
13.85 copying vllm/platforms/interface.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms
13.85 creating build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.85 copying vllm/engine/output_processor/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.85 copying vllm/engine/output_processor/multi_step.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.85 copying vllm/engine/output_processor/stop_checker.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.86 copying vllm/engine/output_processor/util.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.86 copying vllm/engine/output_processor/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.86 copying vllm/engine/output_processor/single_step.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor
13.86 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/weight_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/neuron.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/loader.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/tensorizer.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/openvino.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 copying vllm/model_executor/model_loader/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader
13.86 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding
13.86 copying vllm/model_executor/guided_decoding/outlines_decoding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding
13.86 copying vllm/model_executor/guided_decoding/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding
13.87 copying vllm/model_executor/guided_decoding/lm_format_enforcer_decoding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding
13.87 copying vllm/model_executor/guided_decoding/outlines_logits_processors.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding
13.87 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/pooler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/vocab_parallel_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/linear.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/rejection_sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/activation.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/rotary_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/logits_processor.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/typical_acceptance_sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/layernorm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.87 copying vllm/model_executor/layers/spec_decode_base_sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.88 copying vllm/model_executor/layers/sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
13.88 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/llava.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/xverse.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/falcon.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/intern_vit.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/gpt2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/qwen2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.88 copying vllm/model_executor/models/llama_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/qwen2_moe.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/commandr.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/mpt.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/blip2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/fuyu.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/gemma2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/mlp_speculator.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/arctic.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/chatglm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/opt.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/gpt_j.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/gpt_bigcode.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/llava_next.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/jamba.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/phi3_small.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.89 copying vllm/model_executor/models/phi3v.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/paligemma.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/olmo.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/internlm2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/decilm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/medusa.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/bloom.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/qwen.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/mixtral_quant.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/persimmon.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/gemma.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/clip.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/llama.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/deepseek_v2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/deepseek.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.90 copying vllm/model_executor/models/starcoder2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/mixtral.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/phi.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/chameleon.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/minicpm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/stablelm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/dbrx.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/minicpmv.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/internvl.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/nemotron.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/jais.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/orion.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/blip.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.91 copying vllm/model_executor/models/baichuan.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
13.92 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/aqlm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/fbgemm_fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/base_config.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/squeezellm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/bitsandbytes.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/awq.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/deepspeedfp.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/gptq_marlin.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/marlin.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.92 copying vllm/model_executor/layers/quantization/kv_cache.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.93 copying vllm/model_executor/layers/quantization/gptq_marlin_24.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.93 copying vllm/model_executor/layers/quantization/gptq.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.93 copying vllm/model_executor/layers/quantization/awq_marlin.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.93 copying vllm/model_executor/layers/quantization/schema.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
13.93 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/ops
13.93 copying vllm/model_executor/layers/ops/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/ops
13.93 copying vllm/model_executor/layers/ops/rand.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/ops
13.93 copying vllm/model_executor/layers/ops/sample.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/ops
13.93 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe
13.93 copying vllm/model_executor/layers/fused_moe/moe_pallas.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe
13.93 copying vllm/model_executor/layers/fused_moe/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe
13.93 copying vllm/model_executor/layers/fused_moe/layer.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe
13.93 copying vllm/model_executor/layers/fused_moe/fused_moe.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe
13.94 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/marlin_utils_test_24.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/w8a8_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/quant_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/marlin_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 copying vllm/model_executor/layers/quantization/utils/marlin_utils_test.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils
13.94 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors
13.94 copying vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors
13.94 copying vllm/model_executor/layers/quantization/compressed_tensors/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors
13.94 copying vllm/model_executor/layers/quantization/compressed_tensors/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors
13.95 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_unquantized.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_scheme.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a16_fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_24.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes
13.95 creating build/lib.linux-x86_64-cpython-310/vllm/core/block
13.95 copying vllm/core/block/block_table.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.95 copying vllm/core/block/prefix_caching_block.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 copying vllm/core/block/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 copying vllm/core/block/cpu_gpu_block_allocator.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 copying vllm/core/block/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 copying vllm/core/block/common.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 copying vllm/core/block/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 copying vllm/core/block/naive_block.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block
13.96 creating build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.96 copying vllm/entrypoints/openai/serving_chat.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.96 copying vllm/entrypoints/openai/serving_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.96 copying vllm/entrypoints/openai/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.96 copying vllm/entrypoints/openai/protocol.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.96 copying vllm/entrypoints/openai/serving_completion.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.96 copying vllm/entrypoints/openai/serving_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.97 copying vllm/entrypoints/openai/run_batch.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.97 copying vllm/entrypoints/openai/serving_tokenization.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.97 copying vllm/entrypoints/openai/cli_args.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.97 copying vllm/entrypoints/openai/api_server.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
13.97 creating build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/pynccl.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/custom_all_reduce.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/tpu_communicator.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/shm_broadcast.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/cuda_wrapper.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/custom_all_reduce_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 copying vllm/distributed/device_communicators/pynccl_wrapper.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators
13.97 creating build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/ipex_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/blocksparse_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/flash_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/rocm_flash_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/xformers.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/abstract.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/openvino.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/flashinfer.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/pallas.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 copying vllm/attention/backends/torch_sdpa.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends
13.98 creating build/lib.linux-x86_64-cpython-310/vllm/attention/ops
13.98 copying vllm/attention/ops/ipex_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops
13.98 copying vllm/attention/ops/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops
13.99 copying vllm/attention/ops/triton_flash_attention.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops
13.99 copying vllm/attention/ops/paged_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops
13.99 copying vllm/attention/ops/prefix_prefill.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops
13.99 creating build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention
13.99 copying vllm/attention/ops/blocksparse_attention/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention
13.99 copying vllm/attention/ops/blocksparse_attention/interface.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention
13.99 copying vllm/attention/ops/blocksparse_attention/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention
13.99 copying vllm/attention/ops/blocksparse_attention/blocksparse_attention_kernel.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention
13.99 creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group
13.99 copying vllm/transformers_utils/tokenizer_group/base_tokenizer_group.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group
13.99 copying vllm/transformers_utils/tokenizer_group/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group
13.99 copying vllm/transformers_utils/tokenizer_group/ray_tokenizer_group.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group
13.99 copying vllm/transformers_utils/tokenizer_group/tokenizer_group.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group
14.00 creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers
14.00 copying vllm/transformers_utils/tokenizers/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers
14.00 copying vllm/transformers_utils/tokenizers/baichuan.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers
14.00 creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/falcon.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/__init__.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/mpt.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/mlp_speculator.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/arctic.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/chatglm.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/medusa.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/dbrx.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/internvl.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.00 copying vllm/transformers_utils/configs/nemotron.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.01 copying vllm/transformers_utils/configs/jais.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
14.03 copying vllm/py.typed -> build/lib.linux-x86_64-cpython-310/vllm
14.03 creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.03 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.04 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.05 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.05 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.05 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.05 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.05 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs
14.08 running build_ext
14.91 Using MAX_JOBS=2 as the number of jobs.
14.99 Using NVCC_THREADS=8 as the number of nvcc threads.
17.09 -- The CXX compiler identification is GNU 9.4.0
17.61 -- Detecting CXX compiler ABI info
20.34 -- Detecting CXX compiler ABI info - done
20.52 -- Check for working CXX compiler: /usr/bin/c++ - skipped
20.52 -- Detecting CXX compile features
20.53 -- Detecting CXX compile features - done
20.53 -- Build type: RelWithDebInfo
20.53 -- Target device: cuda
22.37 -- Found Python: /usr/bin/python3 (found version "3.10.14") found components: Interpreter Development.Module Development.SABIModule
22.38 -- Found python matching: /usr/bin/python3.
34.56 -- Found CUDA: /usr/local/cuda (found version "12.4")
41.94 -- The CUDA compiler identification is NVIDIA 12.4.131
42.01 -- Detecting CUDA compiler ABI info
49.27 -- Detecting CUDA compiler ABI info - done
49.83 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
49.90 -- Detecting CUDA compile features
49.90 -- Detecting CUDA compile features - done
49.92 -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.4.131")
49.95 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
52.41 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
52.41 -- Looking for pthread_create in pthreads
54.68 -- Looking for pthread_create in pthreads - not found
54.68 -- Looking for pthread_create in pthread
57.05 -- Looking for pthread_create in pthread - found
57.06 -- Found Threads: TRUE
57.19 -- Caffe2: CUDA detected: 12.4
57.19 -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
57.19 -- Caffe2: CUDA toolkit directory: /usr/local/cuda
59.76 -- Caffe2: Header version is: 12.4
59.77 CMake Warning at /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/public/cuda.cmake:184 (message):
59.77 Failed to compute shorthash for libnvrtc.so
59.77 Call Stack (most recent call first):
59.77 /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:87 (include)
59.77 /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
59.77 CMakeLists.txt:67 (find_package)
59.77
59.77
59.77 -- USE_CUDNN is set to 0. Compiling without cuDNN support
59.77 -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
59.77 CMake Warning at /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/public/utils.cmake:385 (message):
59.77 In the future we will require one to explicitly pass TORCH_CUDA_ARCH_LIST
59.77 to cmake instead of implicitly setting it as an env variable. This will
59.77 become a FATAL_ERROR in future version of pytorch.
59.77 Call Stack (most recent call first):
59.77 /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/public/cuda.cmake:342 (torch_cuda_get_nvcc_gencode_flag)
59.77 /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:87 (include)
59.77 /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
59.77 CMakeLists.txt:67 (find_package)
59.77
59.77
59.78 -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90,code=compute_90
59.81 CMake Warning at /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
59.81 static library kineto_LIBRARY-NOTFOUND not found.
59.81 Call Stack (most recent call first):
59.81 /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
59.81 CMakeLists.txt:67 (find_package)
59.81
59.81
59.82 -- Found Torch: /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch.so
59.82 -- CUDA supported arches: 7.0;7.5;8.0;8.6;8.9;9.0
59.83 -- CUDA target arches: 70-real;75-real;80-real;86-real;89-real;90-real;90-virtual
88.38 -- CMake Version: 3.30.1
88.38 -- CUTLASS 3.5.0
88.38 -- CUDART: /usr/local/cuda/lib64/libcudart.so
88.38 -- CUDA Driver: /usr/local/cuda/lib64/stubs/libcuda.so
88.38 -- NVRTC: /usr/local/cuda/lib64/libnvrtc.so
88.40 -- Default Install Location: install
90.20 -- Found Python3: /usr/bin/python3.10 (found suitable version "3.10.14", minimum required is "3.5") found components: Interpreter
90.22 -- CUDA Compilation Architectures: 70;72;75;80;86;87;89;90;90a
90.22 -- Enable caching of reference results in conv unit tests
90.22 -- Enable rigorous conv problem sizes in conv unit tests
90.23 -- Using NVCC flags: --expt-relaxed-constexpr;-DCUTLASS_TEST_LEVEL=0;-DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1;-DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1;-DCUTLASS_DEBUG_TRACE_LEVEL=0;-Xcompiler=-Wconversion;-Xcompiler=-fno-strict-aliasing;-lineinfo
90.36 fatal: not a git repository (or any of the parent directories): .git
90.36 -- CUTLASS Revision: Unable to detect, Git returned code 128.
90.39 -- Configuring cublas ...
90.39 -- cuBLAS Disabled.
90.39 -- Configuring cuBLAS ... done.
103.8 -- Completed generation of library instances. See /workspace/build/temp.linux-x86_64-cpython-310/_deps/cutlass-build/tools/library/library_instance_generation.log for more information.
111.6 -- Punica target arches: 80-real;86-real;89-real;90-real;90-virtual
111.6 -- Enabling C extension.
111.6 -- Enabling moe extension.
111.6 -- Enabling punica extension.
111.7 -- Configuring done (96.0s)
125.5 -- Generating done (13.8s)
125.5 -- Build files have been written to: /workspace/build/temp.linux-x86_64-cpython-310
125.8 Using MAX_JOBS=2 as the number of jobs.
125.9 Using NVCC_THREADS=8 as the number of nvcc threads.
127.5 [0/2] Re-checking globbed directories...
267.8 [1/38] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/torch_bindings.cpp.o
480.5 [2/38] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o
480.5 FAILED: CMakeFiles/_C.dir/csrc/cache_kernels.cu.o
480.5 ccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/workspace/csrc -I/workspace/build/temp.linux-x86_64-cpython-310/_deps/cutlass-src/include -I/workspace/build/temp.linux-x86_64-cpython-310/_deps/cutlass-src/tools/util/include -isystem /usr/include/python3.10 -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_75,code=[sm_75]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_86,code=[sm_86]" "--generate-code=arch=compute_89,code=[sm_89]" "--generate-code=arch=compute_90,code=[sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=8 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/cache_kernels.cu.o -MF CMakeFiles/_C.dir/csrc/cache_kernels.cu.o.d -x cu -c /workspace/csrc/cache_kernels.cu -o CMakeFiles/_C.dir/csrc/cache_kernels.cu.o
480.5 <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
480.5 <jemalloc>: (This is the expected behaviour if you are running under QEMU)
480.5 Killed
480.5 <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
480.5 <jemalloc>: (This is the expected behaviour if you are running under QEMU)
480.5 Killed
480.5 <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
480.5 <jemalloc>: (This is the expected behaviour if you are running under QEMU)
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(97): warning #940-D: missing return statement at end of non-void function "vllm::bf1622float2"
480.5 }
480.5 ^
480.5
480.5 Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(105): warning #940-D: missing return statement at end of non-void function "vllm::bf162bf162"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(118): warning #940-D: missing return statement at end of non-void function "vllm::add(__nv_bfloat16, __nv_bfloat16)"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(126): warning #940-D: missing return statement at end of non-void function "vllm::add(__nv_bfloat162, __nv_bfloat162)"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(173): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=__nv_bfloat16, A=__nv_bfloat16, B=__nv_bfloat16]"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(182): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=__nv_bfloat162, A=__nv_bfloat162, B=__nv_bfloat162]"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(292): warning #940-D: missing return statement at end of non-void function "vllm::fma(__nv_bfloat162, __nv_bfloat162, __nv_bfloat162)"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(301): warning #940-D: missing return statement at end of non-void function "vllm::fma(__nv_bfloat16, __nv_bfloat162, __nv_bfloat162)"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(478): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_vec_conversion<Tout,Tin>(const Tin &, float, __nv_fp8_interpretation_t) [with Tout=uint8_t, Tin=__nv_bfloat16]"
480.5 }
480.5 ^
480.5
480.5 Killed
480.5 <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
480.5 <jemalloc>: (This is the expected behaviour if you are running under QEMU)
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 374 of /workspace/csrc/cache_kernels.cu
480.5
480.5 Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 376 of /workspace/csrc/cache_kernels.cu
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 378 of /workspace/csrc/cache_kernels.cu
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 380 of /workspace/csrc/cache_kernels.cu
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 382 of /workspace/csrc/cache_kernels.cu
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 384 of /workspace/csrc/cache_kernels.cu
480.5
480.5 <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
480.5 <jemalloc>: (This is the expected behaviour if you are running under QEMU)
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(97): warning #940-D: missing return statement at end of non-void function "vllm::bf1622float2"
480.5 }
480.5 ^
480.5
480.5 Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(105): warning #940-D: missing return statement at end of non-void function "vllm::bf162bf162"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(118): warning #940-D: missing return statement at end of non-void function "vllm::add(__nv_bfloat16, __nv_bfloat16)"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(126): warning #940-D: missing return statement at end of non-void function "vllm::add(__nv_bfloat162, __nv_bfloat162)"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(173): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=__nv_bfloat16, A=__nv_bfloat16, B=__nv_bfloat16]"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(182): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=__nv_bfloat162, A=__nv_bfloat162, B=__nv_bfloat162]"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(292): warning #940-D: missing return statement at end of non-void function "vllm::fma(__nv_bfloat162, __nv_bfloat162, __nv_bfloat162)"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(301): warning #940-D: missing return statement at end of non-void function "vllm::fma(__nv_bfloat16, __nv_bfloat162, __nv_bfloat162)"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(478): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_vec_conversion<Tout,Tin>(const Tin &, float, __nv_fp8_interpretation_t) [with Tout=uint8_t, Tin=__nv_bfloat16]"
480.5 }
480.5 ^
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 374 of /workspace/csrc/cache_kernels.cu
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 376 of /workspace/csrc/cache_kernels.cu
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 378 of /workspace/csrc/cache_kernels.cu
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 380 of /workspace/csrc/cache_kernels.cu
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 382 of /workspace/csrc/cache_kernels.cu
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 384 of /workspace/csrc/cache_kernels.cu
480.5
480.5 <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
480.5 <jemalloc>: (This is the expected behaviour if you are running under QEMU)
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 374 of /workspace/csrc/cache_kernels.cu
480.5
480.5 Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 376 of /workspace/csrc/cache_kernels.cu
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint8_t, Tin=__nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 378 of /workspace/csrc/cache_kernels.cu
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 380 of /workspace/csrc/cache_kernels.cu
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 382 of /workspace/csrc/cache_kernels.cu
480.5
480.5 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
480.5 }
480.5 ^
480.5 detected during:
480.5 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 343 of /workspace/csrc/cache_kernels.cu
480.5 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin *, Tout *, float, int64_t) [with Tout=__nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 384 of /workspace/csrc/cache_kernels.cu
480.5
480.5 ninja: build stopped: subcommand failed.
481.9 Traceback (most recent call last):
482.0 File "/workspace/setup.py", line 459, in <module>
482.1 setup(
482.1 File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 108, in setup
482.1 return distutils.core.setup(**attrs)
482.1 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 184, in setup
482.1 return run_commands(dist)
482.1 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 200, in run_commands
482.1 dist.run_commands()
482.1 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 970, in run_commands
482.2 self.run_command(cmd)
482.2 File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 945, in run_command
482.2 super().run_command(command)
482.2 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 989, in run_command
482.2 cmd_obj.run()
482.2 File "/usr/local/lib/python3.10/dist-packages/setuptools/command/bdist_wheel.py", line 373, in run
482.2 self.run_command("build")
482.2 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 316, in run_command
482.2 self.distribution.run_command(command)
482.2 File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 945, in run_command
482.2 super().run_command(command)
482.2 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 989, in run_command
482.2 cmd_obj.run()
482.2 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build.py", line 135, in run
482.2 self.run_command(cmd_name)
482.2 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 316, in run_command
482.2 self.distribution.run_command(command)
482.2 File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 945, in run_command
482.2 super().run_command(command)
482.2 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 989, in run_command
482.2 cmd_obj.run()
482.2 File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 93, in run
482.2 _build_ext.run(self)
482.2 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
482.2 self.build_extensions()
482.2 File "/workspace/setup.py", line 234, in build_extensions
482.2 subprocess.check_call(["cmake", *build_args], cwd=self.build_temp)
482.2 File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
482.2 raise CalledProcessError(retcode, cmd)
482.2 subprocess.CalledProcessError: Command '['cmake', '--build', '.', '-j=1', '--target=_moe_C', '--target=_C', '--target=_punica_C']' returned non-zero exit status 1.
------
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
2 warnings found (use --debug to expand):
- FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 133)
- FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 143)
Dockerfile:120
--------------------
119 | ENV CCACHE_DIR=/root/.cache/ccache
120 | >>> RUN --mount=type=cache,target=/root/.cache/ccache \
121 | >>> --mount=type=cache,target=/root/.cache/pip \
122 | >>> if [ "$USE_SCCACHE" != "1" ]; then \
123 | >>> python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38; \
124 | >>> fi
125 |
--------------------
ERROR: failed to solve: process "/bin/sh -c if [ \"$USE_SCCACHE\" != \"1\" ]; then python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38; fi" did not complete successfully: exit code: 1
What system are you trying to build this on? The first 2 posts suggest it's a ARM64 (aarch64) system...
Building on arm64, targeting amd64 (so the first two are not relevant for my precise use case, but probably should still not error out?). In any case you can see the build also fails for amd64.
i got the same issue
I am experiencing the same issue.
Same error Freezes after the message
Your current environment
Not applicable -- Dockerfile.
🐛 Describe the bug
Steps to reproduce:
vllm
repodocker build . --target vllm-base