vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.31k stars 4.01k forks source link

Error loading models since versions 0.6.1xxx #8745

Open IdoAmit198 opened 1 day ago

IdoAmit198 commented 1 day ago

Your current environment

PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.30.0
Libc version: glibc-2.31

Python version: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-187-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.3.107
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA A40
GPU 1: NVIDIA A40
GPU 2: NVIDIA A40
GPU 3: NVIDIA A40

Nvidia driver version: 535.183.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      46 bits physical, 57 bits virtual
CPU(s):                             96
On-line CPU(s) list:                0-95
Thread(s) per core:                 2
Core(s) per socket:                 24
Socket(s):                          2
NUMA node(s):                       2
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              106
Model name:                         Intel(R) Xeon(R) Gold 6336Y CPU @ 2.40GHz
Stepping:                           6
CPU MHz:                            800.012
BogoMIPS:                           4800.00
Virtualization:                     VT-x
.
.
.

How you are installing vllm

pip install -U vllm

It seems like in the recent few weeks a lot of crucial updates has been made to properly use vllm, which exist in version 0.6.1 but lacks in version 0.6.1.post2. However, the available version through pip is the old 0.6.1.post2.

For example, #8157 possibly fix issue #8553, which I am also having.


An update - After installing version 0.6.1 via pip, I am still having the error in issue #8553 when I try to initiate the model (which I downloaded already using Huggingface interface)

Traceback (most recent call last): 
  File "<string>", line 1, in <module> 
  File "/home/ido.amit/miniconda3/envs/benchmark/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main 
    exitcode = _main(fd, parent_sentinel)                                                            
               ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                      
  File "/home/ido.amit/miniconda3/envs/benchmark/lib/python3.11/multiprocessing/spawn.py", line 132, in _main                  
    self = reduction.pickle.load(from_parent)                                                                                  
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                 
ModuleNotFoundError: No module named 'transformers_modules.microsoft.Phi-3'                                                   
ERROR 09-24 00:33:22 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 3840118 died, exit code: 1                
INFO 09-24 00:33:22 multiproc_worker_utils.py:123] Killing local vLLM worker processes   

Thanks in advance for the help!

DarkLight1337 commented 13 hours ago

If you want the latest fixes, you'll have to install vLLM from the main branch (i.e. from source)