vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.72k stars 3.91k forks source link

[Installation]: vLLM source install on rocm 6.2 still requires libamdhip64.so.6 #8004

Closed gounley closed 20 hours ago

gounley commented 2 weeks ago

Your current environment

Collecting environment information...
WARNING 08-29 12:39:01 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
PyTorch version: 2.5.0.dev20240821+rocm6.2
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 6.2.41133-dd7f95766

OS: SUSE Linux Enterprise Server 15 SP5 (x86_64)
GCC version: (SUSE Linux) 12.3.0
Clang version: Could not collect
CMake version: version 3.30.2
Libc version: glibc-2.31

Python version: 3.11.8 (main, Feb 26 2024, 21:39:34) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.14.21-150500.55.49_13.0.57-cray_shasta_c-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: AMD Instinct MI210 (gfx90a:sramecc+:xnack-)
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: 6.2.41133
MIOpen runtime version: 3.2.0
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        48 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               128
On-line CPU(s) list:                  0-127
Vendor ID:                            AuthenticAMD
Model name:                           AMD EPYC 7763 64-Core Processor
CPU family:                           25
Model:                                1
Thread(s) per core:                   2
Core(s) per socket:                   64
Socket(s):                            1
Stepping:                             1
BogoMIPS:                             4890.48
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
L1d cache:                            2 MiB (64 instances)
L1i cache:                            2 MiB (64 instances)
L2 cache:                             32 MiB (64 instances)
L3 cache:                             256 MiB (8 instances)
NUMA node(s):                         4
NUMA node0 CPU(s):                    0-15,64-79
NUMA node1 CPU(s):                    16-31,80-95
NUMA node2 CPU(s):                    32-47,96-111
NUMA node3 CPU(s):                    48-63,112-127
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Mitigation; Safe RET
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-ml-py==12.560.30
[pip3] optree==0.12.1
[pip3] pytorch-triton-rocm==3.0.0+21eae954ef
[pip3] pyzmq==26.1.1
[pip3] torch==2.5.0.dev20240821+rocm6.2
[pip3] torcheval==0.0.7
[pip3] torchmetrics==1.4.1
[pip3] transformers==4.44.1
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-ml-py              12.560.30                pypi_0    pypi
[conda] optree                    0.12.1                   pypi_0    pypi
[conda] pytorch-triton-rocm       3.0.0+21eae954ef          pypi_0    pypi
[conda] pyzmq                     26.1.1                   pypi_0    pypi
[conda] torch                     2.5.0.dev20240821+rocm6.2          pypi_0    pypi
[conda] torcheval                 0.0.7                    pypi_0    pypi
[conda] torchmetrics              1.4.1                    pypi_0    pypi
[conda] transformers              4.44.1                   pypi_0    pypi
ROCM Version: 6.2.41133-dd7f95766
Neuron SDK Version: N/A
vLLM Version: 0.5.5@f205c09854853172a446c92aa81eb7199da324ab
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

How you are installing vllm

pip install --upgrade pip

# Install PyTorch
pip uninstall torch -y
pip install --no-cache-dir --pre torch==2.5.0.dev20240726 --index-url https://download.pytorch.org/whl/nightly/rocm6.1

# Build & install AMD SMI
pip install /opt/rocm/share/amd_smi

# Install dependencies
pip install --upgrade numba scipy huggingface-hub[cli]
pip install "numpy<2"
pip install -r requirements-rocm.txt

# Apply the patch to ROCM 6.1 (requires root permission)
wget -N https://github.com/ROCm/vllm/raw/fa78403/rocm_patch/libamdhip64.so.6 -P /opt/rocm/lib
rm -f "$(python3 -c 'import torch; print(torch.__path__[0])')"/lib/libamdhip64.so*

# Build vLLM for MI210/MI250/MI300.
export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
python3 setup.py develop

Before submitting a new issue...

gounley commented 2 weeks ago

Based on https://github.com/vllm-project/vllm/blob/main/Dockerfile.rocm#L145, I didn't expect that manually installing libamdhip64.so.6 would be required with rocm 6.2. However, the following error still results without it:

ninja: error: '/opt/rocm/lib/libamdhip64.so', needed by '/full_path_here/vllm/_core_C.abi3.so', missing and no known rule to make it

youkaichao commented 2 weeks ago

cc @hongxiayang

hongxiayang commented 1 week ago

The steps documented for building vllm locally does not apply for ROCm 6.2 installation. We will look into this.

hongxiayang commented 20 hours ago

This is the duplicate of https://github.com/vllm-project/vllm/issues/8042. Close this one since we have been discussing work-arounds in the other issue.