Closed Mr-KenLee closed 2 weeks ago
you can see this warning:
WARNING 08-25 16:33:04 cuda.py:22] You are using a deprecated
pynvml
package. Please installnvidia-ml-py
instead. See https://pypi.org/project/pynvml for more information.
I think you should uninstall pynvml
.
the bug you mention, is trying to create a p2p cache file. when you switch to 0.5.4, the file is generated successfully. and when you switch to 0.5.5 , that part of the code does not execute anymore.
I improved the warning message in https://github.com/vllm-project/vllm/pull/7852 , please take a look @Mr-KenLee
and https://github.com/vllm-project/vllm/pull/7853 should fix this problem. @Mr-KenLee please have a try.
@youkaichao thank you very much! I will have a try immediately.
Your current environment
The output of `python collect_env.py`
```text Collecting environment information... WARNING 08-25 16:33:04 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead. See https://pypi.org/project/pynvml for more information. PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.30.2 Libc version: glibc-2.35 Python version: 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-4.14.105-1-tlinux3-0013-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: A100-SXM4-40GB GPU 1: A100-SXM4-40GB Nvidia driver version: 450.80.02 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.4 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.4 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.4 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.4 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.4 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.4 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.4 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: 架构: x86_64 CPU 运行模式: 32-bit, 64-bit Address sizes: 43 bits physical, 48 bits virtual 字节序: Little Endian CPU: 192 在线 CPU 列表: 0-191 厂商 ID: AuthenticAMD 型号名称: AMD EPYC 7K62 48-Core Processor CPU 系列: 23 型号: 49 每个核的线程数: 2 每个座的核数: 48 座: 2 步进: 0 Frequency boost: enabled CPU 最大 MHz: 2600.0000 CPU 最小 MHz: 1500.0000 BogoMIPS: 5189.76 标记: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca 虚拟化: AMD-V L1d 缓存: 3 MiB (96 instances) L1i 缓存: 3 MiB (96 instances) L2 缓存: 48 MiB (96 instances) L3 缓存: 384 MiB (24 instances) NUMA 节点: 2 NUMA 节点0 CPU: 0-47,96-143 NUMA 节点1 CPU: 48-95,144-191 Vulnerability L1tf: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled Versions of relevant libraries: [pip3] numpy==1.26.3 [pip3] nvidia-cublas-cu12==12.1.3.1 [pip3] nvidia-cuda-cupti-cu12==12.1.105 [pip3] nvidia-cuda-nvrtc-cu12==12.1.105 [pip3] nvidia-cuda-runtime-cu12==12.1.105 [pip3] nvidia-cudnn-cu12==9.1.0.70 [pip3] nvidia-cufft-cu12==11.0.2.54 [pip3] nvidia-curand-cu12==10.3.2.106 [pip3] nvidia-cusolver-cu12==11.4.5.107 [pip3] nvidia-cusparse-cu12==12.1.0.106 [pip3] nvidia-ml-py==12.560.30 [pip3] nvidia-nccl-cu12==2.20.5 [pip3] nvidia-nvjitlink-cu12==12.1.105 [pip3] nvidia-nvtx-cu12==12.1.105 [pip3] pynvml==11.5.0 [pip3] pyzmq==26.2.0 [pip3] torch==2.4.0 [pip3] torchaudio==2.3.0+cu121 [pip3] torchvision==0.19.0 [pip3] transformers==4.44.2 [pip3] triton==3.0.0 [conda] numpy 1.26.3 pypi_0 pypi [conda] nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi [conda] nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi [conda] nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi [conda] nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi [conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi [conda] nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi [conda] nvidia-curand-cu12 10.3.2.106 pypi_0 pypi [conda] nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi [conda] nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi [conda] nvidia-ml-py 12.560.30 pypi_0 pypi [conda] nvidia-nccl-cu12 2.20.5 pypi_0 pypi [conda] nvidia-nvjitlink-cu12 12.1.105 pypi_0 pypi [conda] nvidia-nvtx-cu12 12.1.105 pypi_0 pypi [conda] pynvml 11.5.0 pypi_0 pypi [conda] pyzmq 26.2.0 pypi_0 pypi [conda] torch 2.4.0 pypi_0 pypi [conda] torchaudio 2.3.0+cu121 pypi_0 pypi [conda] torchvision 0.19.0 pypi_0 pypi [conda] transformers 4.44.2 pypi_0 pypi [conda] triton 3.0.0 pypi_0 pypi ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.5.5@09c7792610ada9f88bbf87d32b472dd44bf23cc2 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 GPU1 mlx5_0 mlx5_1 mlx5_2 mlx5_3 mlx5_4 mlx5_5 mlx5_6 mlx5_7 mlx5_8 mlx5_9 mlx5_10 mlx5_11 mlx5_12 mlx5_13 mlx5_14 mlx5_15 mlx5_16 mlx5_17 mlx5_18 mlx5_19 mlx5_20 mlx5_21 mlx5_22 mlx5_23 mlx5_24 mlx5_25 CPU Affinity NUMA Affinity GPU0 X NV12 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYSSYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS 48-95,144-191 1 GPU1 NV12 X SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYSSYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS 48-95,144-191 1 mlx5_0 SYS SYS X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_1 SYS SYS PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_2 SYS SYS PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_3 SYS SYS PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_4 SYS SYS PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_5 SYS SYS PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_6 SYS SYS PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_7 SYS SYS PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_8 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_9 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_10 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_11 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_12 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_13 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_14 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_15 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_16 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_17 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX mlx5_18 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX mlx5_19 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX mlx5_20 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX mlx5_21 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX mlx5_22 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX mlx5_23 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX mlx5_24 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX mlx5_25 SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks ```🐛 Describe the bug
I attempted to use vllm==0.5.5 and ran the following script:
However, I encountered the following error while loading the model:
Interestingly, when I switched to vllm==0.5.4, it loaded successfully. After that, if I switch back to vllm==0.5.5, it also loads successfully. Could you please explain why this is happening?
Before submitting a new issue...