vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
22.22k stars 3.14k forks source link

[Installation]: failed installation with pip #4980

Closed adriacb closed 1 month ago

adriacb commented 1 month ago

Your current environment

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 8.4.0-3ubuntu2) 8.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.9.19 (main, May  6 2024, 19:43:03)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-106-generic-x86_64-with-glibc2.31
Is CUDA available: N/A
CUDA runtime version: 12.2.140
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: 
GPU 0: Quadro RTX 8000
GPU 1: Quadro RTX 8000

Nvidia driver version: 535.161.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      46 bits physical, 48 bits virtual
CPU(s):                             72
On-line CPU(s) list:                0-71
Thread(s) per core:                 2
Core(s) per socket:                 18
Socket(s):                          2
NUMA node(s):                       2
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              85
Model name:                         Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Stepping:                           7
CPU MHz:                            2600.000
CPU max MHz:                        3900.0000
CPU min MHz:                        1000.0000
BogoMIPS:                           5200.00
L1d cache:                          1.1 MiB
L1i cache:                          1.1 MiB
L2 cache:                           36 MiB
L3 cache:                           49.5 MiB
NUMA node0 CPU(s):                  0-17,36-53
NUMA node1 CPU(s):                  18-35,54-71
...

Versions of relevant libraries:
[pip3] No relevant packages
[conda] No relevant packagesROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X  NV2 0-17,36-53  0       N/A
GPU1    NV2  X  0-17,36-53  0       N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

How you are installing vllm

conda create -n test python=3.9 -y
conda activate test
pip install vllm

The error:

Defaulting to user installation because normal site-packages is not writeable
Collecting vllm
  Using cached vllm-0.4.2.tar.gz (588 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting cmake>=3.21 (from vllm)
  Using cached cmake-3.29.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.1 kB)
Collecting ninja (from vllm)
  Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
Collecting psutil (from vllm)
  Using cached psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (21 kB)
Collecting sentencepiece (from vllm)
  Using cached sentencepiece-0.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Requirement already satisfied: numpy in /home/adria/.local/lib/python3.12/site-packages (from vllm) (1.26.4)
Requirement already satisfied: requests in /opt/conda/lib/python3.12/site-packages (from vllm) (2.31.0)
Collecting py-cpuinfo (from vllm)
  Using cached py_cpuinfo-9.0.0-py3-none-any.whl.metadata (794 bytes)
Requirement already satisfied: transformers>=4.40.0 in /home/adria/.local/lib/python3.12/site-packages (from vllm) (4.41.0)
Requirement already satisfied: tokenizers>=0.19.1 in /home/adria/.local/lib/python3.12/site-packages (from vllm) (0.19.1)
Collecting fastapi (from vllm)
  Using cached fastapi-0.111.0-py3-none-any.whl.metadata (25 kB)
Requirement already satisfied: openai in /home/adria/.local/lib/python3.12/site-packages (from vllm) (1.30.1)
Collecting uvicorn[standard] (from vllm)
  Using cached uvicorn-0.29.0-py3-none-any.whl.metadata (6.3 kB)
Requirement already satisfied: pydantic>=2.0 in /home/adria/.local/lib/python3.12/site-packages (from vllm) (2.7.1)
Collecting prometheus-client>=0.18.0 (from vllm)
  Using cached prometheus_client-0.20.0-py3-none-any.whl.metadata (1.8 kB)
Collecting prometheus-fastapi-instrumentator>=7.0.0 (from vllm)
  Using cached prometheus_fastapi_instrumentator-7.0.0-py3-none-any.whl.metadata (13 kB)
Collecting tiktoken==0.6.0 (from vllm)
  Using cached tiktoken-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting lm-format-enforcer==0.9.8 (from vllm)
  Using cached lm_format_enforcer-0.9.8-py3-none-any.whl.metadata (14 kB)
Collecting outlines==0.0.34 (from vllm)
  Using cached outlines-0.0.34-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: typing-extensions in /home/adria/.local/lib/python3.12/site-packages (from vllm) (4.11.0)
Requirement already satisfied: filelock>=3.10.4 in /home/adria/.local/lib/python3.12/site-packages (from vllm) (3.14.0)
INFO: pip is looking at multiple versions of vllm to determine which version is compatible with other requirements. This could take a while.
Collecting vllm
  Using cached vllm-0.4.1.tar.gz (534 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
  Using cached vllm-0.3.3.tar.gz (315 kB)
  Installing build dependencies ... error
  error: subprocess-exited-with-error

  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      Collecting ninja
        Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
      Collecting packaging
        Using cached packaging-24.0-py3-none-any.whl.metadata (3.2 kB)
      Collecting setuptools>=49.4.0
        Using cached setuptools-70.0.0-py3-none-any.whl.metadata (5.9 kB)
      ERROR: Could not find a version that satisfies the requirement torch==2.1.2 (from versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0)
      ERROR: No matching distribution found for torch==2.1.2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip
youkaichao commented 1 month ago

Using cached sentencepiece-0.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB) Requirement already satisfied: numpy in /home/adria/.local/lib/python3.12/site-packages (from vllm) (1.26.4)

Your pip somehow uses python 3.12's package. You need to figure out what's wrong with it.

adriacb commented 1 month ago

Thank you!