vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
31.07k stars 4.72k forks source link

[Installation]: Installation fails on Neuron installation with newest build #3509

Closed jimburtoft closed 1 day ago

jimburtoft commented 8 months ago

Your current environment

The script failed in my virtual environment, so I ran it outside

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-1031-aws-x86_64-with-glibc2.35
Is CUDA available: N/A
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   48 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          192
On-line CPU(s) list:             0-191
Vendor ID:                       AuthenticAMD
Model name:                      AMD EPYC 7R13 Processor
CPU family:                      25
Model:                           1
Thread(s) per core:              2
Core(s) per socket:              48
Socket(s):                       2
Stepping:                        1
BogoMIPS:                        5299.99
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save vaes vpclmulqdq rdpid
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       3 MiB (96 instances)
L1i cache:                       3 MiB (96 instances)
L2 cache:                        48 MiB (96 instances)
L3 cache:                        384 MiB (12 instances)
NUMA node(s):                    4
NUMA node0 CPU(s):               0-23,96-119
NUMA node1 CPU(s):               24-47,120-143
NUMA node2 CPU(s):               48-71,144-167
NUMA node3 CPU(s):               72-95,168-191
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] No relevant packages
[conda] Could not collectROCM Version: Could not collect
Neuron SDK Version: (0, 'instance-type: inf2.48xlarge\ninstance-id: i-0007c196eba7012c0\n+--------+--------+--------+-----------+---------+\n| NEURON | NEURON | NEURON | CONNECTED |   PCI   |\n| DEVICE | CORES  | MEMORY |  DEVICES  |   BDF   |\n+--------+--------+--------+-----------+---------+\n| 0      | 2      | 32 GB  | 11, 1     | 80:1e.0 |\n| 1      | 2      | 32 GB  | 0, 2      | 90:1e.0 |\n| 2      | 2      | 32 GB  | 1, 3      | 80:1d.0 |\n| 3      | 2      | 32 GB  | 2, 4      | 90:1f.0 |\n| 4      | 2      | 32 GB  | 3, 5      | 80:1f.0 |\n| 5      | 2      | 32 GB  | 4, 6      | 90:1d.0 |\n| 6      | 2      | 32 GB  | 5, 7      | 20:1e.0 |\n| 7      | 2      | 32 GB  | 6, 8      | 20:1f.0 |\n| 8      | 2      | 32 GB  | 7, 9      | 10:1e.0 |\n| 9      | 2      | 32 GB  | 8, 10     | 10:1f.0 |\n| 10     | 2      | 32 GB  | 9, 11     | 10:1d.0 |\n| 11     | 2      | 32 GB  | 10, 0     | 20:1d.0 |\n+--------+--------+--------+-----------+---------+', '')
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

How you are installing vllm


cd vllm
pip install -U -r requirements-neuron.txt
pip install .

Produces error:

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Processing /home/ubuntu/vllm
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [19 lines of output]
      /tmp/pip-build-env-swu4w78o/overlay/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
        device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
      Traceback (most recent call last):
        File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
          main()
        File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 130, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-swu4w78o/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/pip-build-env-swu4w78o/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-swu4w78o/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 340, in <module>
        File "<string>", line 266, in get_vllm_version
        File "<string>", line 237, in get_nvcc_cuda_version
      TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
jimburtoft commented 8 months ago

It looks like this might be related to PR-2671

I changed the lines around 171 in setup.py:

def _is_cuda() -> bool:
    return (torch.version.cuda is not None) and not _is_neuron()

And then the process finished.

Keep in mind, on my system, I don't have cuda installed.

liangfu commented 8 months ago

Sorry for the inconvenience, we need to sure neuron-ls and neuronx-cc commands are correctly installed in the system, before pip install vllm.

Legion2 commented 8 months ago

I have the same problem. I'm trying to create a docker image which I then can deploy on aws inf2 instances. However I don't have access to a neuron instance during the build process of the docker image. I have installed neuron-ls and neuronx-cc in the docker image, but of cause when neuron-ls is executed during the docker build process there is no neuron device available.

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions[bot] commented 1 day ago

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!