vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.31k stars 4.59k forks source link

[Installation]: vLLM source install on rocm 6.2 still requires libamdhip64.so.6 #8004

Closed gounley closed 2 months ago

gounley commented 2 months ago

Your current environment

Collecting environment information...
WARNING 08-29 12:39:01 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
PyTorch version: 2.5.0.dev20240821+rocm6.2
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 6.2.41133-dd7f95766

OS: SUSE Linux Enterprise Server 15 SP5 (x86_64)
GCC version: (SUSE Linux) 12.3.0
Clang version: Could not collect
CMake version: version 3.30.2
Libc version: glibc-2.31

Python version: 3.11.8 (main, Feb 26 2024, 21:39:34) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.14.21-150500.55.49_13.0.57-cray_shasta_c-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: AMD Instinct MI210 (gfx90a:sramecc+:xnack-)
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: 6.2.41133
MIOpen runtime version: 3.2.0
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        48 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               128
On-line CPU(s) list:                  0-127
Vendor ID:                            AuthenticAMD
Model name:                           AMD EPYC 7763 64-Core Processor
CPU family:                           25
Model:                                1
Thread(s) per core:                   2
Core(s) per socket:                   64
Socket(s):                            1
Stepping:                             1
BogoMIPS:                             4890.48
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
L1d cache:                            2 MiB (64 instances)
L1i cache:                            2 MiB (64 instances)
L2 cache:                             32 MiB (64 instances)
L3 cache:                             256 MiB (8 instances)
NUMA node(s):                         4
NUMA node0 CPU(s):                    0-15,64-79
NUMA node1 CPU(s):                    16-31,80-95
NUMA node2 CPU(s):                    32-47,96-111
NUMA node3 CPU(s):                    48-63,112-127
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Mitigation; Safe RET
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-ml-py==12.560.30
[pip3] optree==0.12.1
[pip3] pytorch-triton-rocm==3.0.0+21eae954ef
[pip3] pyzmq==26.1.1
[pip3] torch==2.5.0.dev20240821+rocm6.2
[pip3] torcheval==0.0.7
[pip3] torchmetrics==1.4.1
[pip3] transformers==4.44.1
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-ml-py              12.560.30                pypi_0    pypi
[conda] optree                    0.12.1                   pypi_0    pypi
[conda] pytorch-triton-rocm       3.0.0+21eae954ef          pypi_0    pypi
[conda] pyzmq                     26.1.1                   pypi_0    pypi
[conda] torch                     2.5.0.dev20240821+rocm6.2          pypi_0    pypi
[conda] torcheval                 0.0.7                    pypi_0    pypi
[conda] torchmetrics              1.4.1                    pypi_0    pypi
[conda] transformers              4.44.1                   pypi_0    pypi
ROCM Version: 6.2.41133-dd7f95766
Neuron SDK Version: N/A
vLLM Version: 0.5.5@f205c09854853172a446c92aa81eb7199da324ab
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

How you are installing vllm

pip install --upgrade pip

# Install PyTorch
pip uninstall torch -y
pip install --no-cache-dir --pre torch==2.5.0.dev20240726 --index-url https://download.pytorch.org/whl/nightly/rocm6.1

# Build & install AMD SMI
pip install /opt/rocm/share/amd_smi

# Install dependencies
pip install --upgrade numba scipy huggingface-hub[cli]
pip install "numpy<2"
pip install -r requirements-rocm.txt

# Apply the patch to ROCM 6.1 (requires root permission)
wget -N https://github.com/ROCm/vllm/raw/fa78403/rocm_patch/libamdhip64.so.6 -P /opt/rocm/lib
rm -f "$(python3 -c 'import torch; print(torch.__path__[0])')"/lib/libamdhip64.so*

# Build vLLM for MI210/MI250/MI300.
export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
python3 setup.py develop

Before submitting a new issue...

gounley commented 2 months ago

Based on https://github.com/vllm-project/vllm/blob/main/Dockerfile.rocm#L145, I didn't expect that manually installing libamdhip64.so.6 would be required with rocm 6.2. However, the following error still results without it:

ninja: error: '/opt/rocm/lib/libamdhip64.so', needed by '/full_path_here/vllm/_core_C.abi3.so', missing and no known rule to make it

youkaichao commented 2 months ago

cc @hongxiayang

hongxiayang commented 2 months ago

The steps documented for building vllm locally does not apply for ROCm 6.2 installation. We will look into this.

hongxiayang commented 2 months ago

This is the duplicate of https://github.com/vllm-project/vllm/issues/8042. Close this one since we have been discussing work-arounds in the other issue.