vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
22.67k stars 3.19k forks source link

[Installation]: vllm on NVIDIA jetson AGX orin #5640

Open phgcha opened 3 weeks ago

phgcha commented 3 weeks ago

Your current environment

root@jetson:/workspace# python collect_env.py
  File "collect_env.py", line 724
    print(msg, file=sys.stderr)
                   ^
SyntaxError: invalid syntax
root@jetson:/workspace# python3 collect_env.py
Collecting environment information...
PyTorch version: 2.0.0a0+ec3941ad.nv23.02
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (aarch64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.25.2
Libc version: glibc-2.31

Python version: 3.8.10 (default, Nov 14 2022, 12:59:47)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.10.120-tegra-aarch64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.315
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.6.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: False

CPU:
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          12
On-line CPU(s) list:             0-7
Off-line CPU(s) list:            8-11
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       2
Vendor ID:                       ARM
Model:                           1
Model name:                      ARMv8 Processor rev 1 (v8l)
Stepping:                        r0p1
CPU max MHz:                     2201.6001
CPU min MHz:                     115.2000
BogoMIPS:                        62.50
L1d cache:                       512 KiB
L1i cache:                       512 KiB
L2 cache:                        2 MiB
L3 cache:                        4 MiB
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; CSV2, but not BHB
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm

Versions of relevant libraries:
[pip3] numpy==1.17.4
[pip3] torch==2.0.0a0+ec3941ad.nv23.2
[pip3] torchaudio==0.13.1+b90d798
[pip3] torchvision==0.14.1a0+5e8e2f1
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

How you are installing vllm

pip install vllm

Hi,

I'm trying to install vllm on my Jetson AGX orin developer kit.

I'm using the following image: nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3 and I get this error when I pip install vllm

root@jetson:/workspace# pip install vllm
Collecting vllm
  Downloading vllm-0.5.0.post1.tar.gz (743 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 743.2/743.2 kB 12.6 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [16 lines of output]
      Traceback (most recent call last):
        File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-d1bct981/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/pip-build-env-d1bct981/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-d1bct981/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 415, in <module>
        File "<string>", line 341, in get_vllm_version
      RuntimeError: Unknown runtime environment
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

Note the error message Unknown runtime environment I figured that this is thrown here https://github.com/vllm-project/vllm/blob/main/setup.py#L347 due to torch.version.cuda being none

However, when I prompt python3 and try verifying the cuda availability,

root@jetson:/workspace# python3
Python 3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.version.cuda)
11.4

Any help would be appreciated. Thanks

youkaichao commented 3 weeks ago

I suggest you contact your support team. You are using a custom built pytorch, which we can answer very limited questions about.

phgcha commented 3 weeks ago

Thanks, will do. But does anyone have rough idea what might have caused this?

Shatatel commented 2 weeks ago

Same issue here Appears right after fresh Jetson AGX flashing

here some sw info

jetson_release 

Model: NVIDIA Jetson AGX Orin Developer Kit - Jetpack 6.0 [L4T 36.3.0]
NV Power Mode[0]: MAXN
Hardware:
 - Module: NVIDIA Jetson AGX Orin
Platform:
 - Distribution: Ubuntu 22.04 Jammy Jellyfish
 - Release: 5.15.136-tegra
jtop:
 - Version: 4.2.8
 - Service: Active
Libraries:
 - CUDA: 12.2.140
 - cuDNN: 8.9.4.25
 - TensorRT: 8.6.2.3
 - VPI: 3.1.5
 - Vulkan: 1.3.204
 - OpenCV: 4.10.0-dev - with CUDA: YES
Python 3.10.12 [GCC 11.4.0] on linux
>>> import torch
>>> print(torch.version.cuda)
12.2
Irrerwirrer commented 1 week ago

Same problem here. Using print(torch.__version__), I found that current vllm tries to use PyTorch version 2.3.0 in setup.py . However, my currently installed version is 2.1.0a0+41361538.nv23.06, which is the latest supported by my Jetpack 5.1.2. According to the NVIDIA documentation, PyTorch 2.3.0 is not supported on my setup.

To work around this, I attempted to install vllm-0.2.4, which requires PyTorch only 2.1.0. However, I encountered an OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. error, even though I have already added the correct CUDA_HOME to my .bashrc

Acc1143 commented 1 week ago

怎么解决