vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
28.76k stars 4.26k forks source link

[Installation]: vllm ROCm failed to build on Docker. #8955

Open limyenkai opened 2 weeks ago

limyenkai commented 2 weeks ago

Your current environment

The output of `python collect_env.py`

OS: Ubuntu 24.04.1 LTS (x86_64)
GCC version: (Ubuntu 13.2.0-23ubuntu4) 13.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.39

Python version: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-45-generic-x86_64-with-glibc2.39
Is CUDA available: N/A
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        52 bits physical, 57 bits virtual
Byte Order:                           Little Endian
CPU(s):                               128
On-line CPU(s) list:                  0-127
Vendor ID:                            AuthenticAMD
Model name:                           AMD EPYC 9334 32-Core Processor
CPU family:                           25
Model:                                17
Thread(s) per core:                   2
Core(s) per socket:                   32
Socket(s):                            2
Stepping:                             1
BogoMIPS:                             5399.99
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d debug_swap
Virtualization:                       AMD-V
L1d cache:                            2 MiB (64 instances)
L1i cache:                            2 MiB (64 instances)
L2 cache:                             64 MiB (64 instances)
L3 cache:                             256 MiB (8 instances)
NUMA node(s):                         2
NUMA node0 CPU(s):                    0-31,64-95
NUMA node1 CPU(s):                    32-63,96-127
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Mitigation; Safe RET
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] pyzmq==26.2.0
[conda] pyzmq                     26.2.0                   pypi_0    pypi
ROCM Version: 6.2.41134-65d174c3e
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
============================ ROCm System Management Interface ============================
================================ Weight between two GPUs =================================
       GPU0
GPU0   0

================================= Hops between two GPUs ==================================
       GPU0
GPU0   0

=============================== Link Type between two GPUs ===============================
       GPU0
GPU0   0

======================================= Numa Nodes =======================================
GPU[0]          : (Topology) Numa Node: 0
GPU[0]          : (Topology) Numa Affinity: 0
================================== End of ROCm SMI Log ===================================

How you are installing vllm

sudo DOCKER_BUILDKIT=1 docker build -f Dockerfile.rocm -t vllm-rocm .
 > [build_triton 1/1] RUN --mount=type=cache,target=/root/.cache/ccache     if [ "1" = "1" ]; then     mkdir -p libs     && cd libs     && python3 -m p                     ip install ninja cmake wheel pybind11     && git clone https://github.com/OpenAI/triton.git     && cd triton     && git checkout "e192dba"     && cd py                     thon     && python3 setup.py bdist_wheel --dist-dir=/install;     else mkdir -p /install;     fi:
5.519 Collecting ninja
10.54   Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
10.64 Collecting cmake
10.65   Downloading cmake-3.30.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.4 kB)
10.65 Requirement already satisfied: wheel in /opt/conda/envs/py_3.9/lib/python3.9/site-packages (0.43.0)
10.67 Collecting pybind11
10.67   Downloading pybind11-2.13.6-py3-none-any.whl.metadata (9.5 kB)
10.68 Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
10.71 Downloading cmake-3.30.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.9 MB)
11.54    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.9/26.9 MB 32.5 MB/s eta 0:00:00
11.54 Downloading pybind11-2.13.6-py3-none-any.whl (243 kB)
11.77 Installing collected packages: ninja, pybind11, cmake
12.45 Successfully installed cmake-3.30.4 ninja-1.11.1.1 pybind11-2.13.6
12.45 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rend                     ering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option                      if you know what you are doing and want to suppress this warning.
12.50 Cloning into 'triton'...
41.57 Note: switching to 'e192dba'.
41.57
41.57 You are in 'detached HEAD' state. You can look around, make experimental
41.57 changes and commit them, and you can discard any commits you make in this
41.57 state without impacting any branches by switching back to a branch.
41.57
41.57 If you want to create a new branch to retain commits you create, you may
41.57 do so (now or later) by using -c with the switch command. Example:
41.57
41.57   git switch -c <new-branch-name>
41.57
41.57 Or undo this operation with:
41.57
41.57   git switch -
41.57
41.57 Turn off this advice by setting config variable advice.detachedHead to false
41.57
41.57 HEAD is now at e192dba22 [AMD] Hoist Q out of the loop for FA optimization (#4666)
226.8 downloading and extracting https://anaconda.org/nvidia/cuda-nvcc/12.4.99/download/linux-64/cuda-nvcc-12.4.99-0.tar.bz2 ...
226.8 Traceback (most recent call last):
226.8   File "/vllm-workspace/libs/triton/python/setup.py", line 489, in <module>
226.8     download_and_copy(
226.8   File "/vllm-workspace/libs/triton/python/setup.py", line 288, in download_and_copy
226.8     file = tarfile.open(fileobj=open_url(url), mode="r|*")
226.8   File "/vllm-workspace/libs/triton/python/setup.py", line 209, in open_url
226.8     return urllib.request.urlopen(request, timeout=300)
226.8   File "/opt/conda/envs/py_3.9/lib/python3.9/urllib/request.py", line 214, in urlopen
226.8     return opener.open(url, data, timeout)
226.8   File "/opt/conda/envs/py_3.9/lib/python3.9/urllib/request.py", line 523, in open
226.8     response = meth(req, response)
226.8   File "/opt/conda/envs/py_3.9/lib/python3.9/urllib/request.py", line 632, in http_response
226.8     response = self.parent.error(
226.8   File "/opt/conda/envs/py_3.9/lib/python3.9/urllib/request.py", line 561, in error
226.8     return self._call_chain(*args)
226.8   File "/opt/conda/envs/py_3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
226.8     result = func(*args)
226.8   File "/opt/conda/envs/py_3.9/lib/python3.9/urllib/request.py", line 641, in http_error_default
226.8     raise HTTPError(req.full_url, code, msg, hdrs, fp)
226.8 urllib.error.HTTPError: HTTP Error 524:
------

 1 warning found (use docker --debug to expand):
 - UndefinedVar: Usage of undefined variable '$CPLUS_INCLUDE_PATH' (line 63)
Dockerfile.rocm:101
--------------------
 100 |     # Build triton wheel if `BUILD_TRITON = 1`
 101 | >>> RUN --mount=type=cache,target=${CCACHE_DIR} \
 102 | >>>     if [ "$BUILD_TRITON" = "1" ]; then \
 103 | >>>     mkdir -p libs \
 104 | >>>     && cd libs \
 105 | >>>     && python3 -m pip install ninja cmake wheel pybind11 \
 106 | >>>     && git clone https://github.com/OpenAI/triton.git \
 107 | >>>     && cd triton \
 108 | >>>     && git checkout "${TRITON_BRANCH}" \
 109 | >>>     && cd python \
 110 | >>>     && python3 setup.py bdist_wheel --dist-dir=/install; \
 111 | >>>     # Create an empty directory otherwise as later build stages expect one
 112 | >>>     else mkdir -p /install; \
 113 | >>>     fi
 114 |
--------------------
ERROR: failed to solve: process "/bin/sh -c if [ \"$BUILD_TRITON\" = \"1\" ]; then     mkdir -p libs     && cd libs     && python3 -m pip install ninja                      cmake wheel pybind11     && git clone https://github.com/OpenAI/triton.git     && cd triton     && git checkout \"${TRITON_BRANCH}\"     && cd python                          && python3 setup.py bdist_wheel --dist-dir=/install;     else mkdir -p /install;     fi" did not complete successfully: exit code: 1

Vllm couldn't build successfully, I tried downloading https://anaconda.org/nvidia/cuda-nvcc/12.4.99/download/linux-64/cuda-nvcc-12.4.99-0.tar.bz2 and was able to do it. Haven't been able to find a fix yet, anyone else faced this issue too?

Before submitting a new issue...

Eggwardhan commented 2 weeks ago

+1

benhaotang commented 6 days ago

The triton repo moved to triton-lang/triton instead of OpenAI/triton. You have to change that in the docker file. After that seems to run fine.