vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
28.3k stars 4.19k forks source link

[Bug]: NameError: name 'ncclGetVersion' is not defined #4294

Closed zhaoxf4 closed 5 months ago

zhaoxf4 commented 5 months ago

Your current environment

Collecting environment information...
PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.29.2
Libc version: glibc-2.27

Python version: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.15.0-136-generic-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: Tesla T4
GPU 1: Tesla T4
GPU 2: Tesla T4
GPU 3: Tesla T4

Nvidia driver version: 535.129.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  2
Core(s) per socket:  16
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Gold 6278C CPU @ 2.60GHz
Stepping:            7
CPU MHz:             2600.000
BogoMIPS:            5200.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-31
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni md_clear flush_l1d arch_capabilities

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.19.3
[pip3] torch==2.2.1
[pip3] triton==2.2.0
[pip3] vllm-nccl-cu12==2.18.1.0.3.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-nccl-cu12          2.19.3                   pypi_0    pypi
[conda] torch                     2.2.1                    pypi_0    pypi
[conda] triton                    2.2.0                    pypi_0    pypi
[conda] vllm-nccl-cu12            2.18.1.0.3.0             pypi_0    pypiROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.4.1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PHB     PHB     PHB     0-31    0               N/A
GPU1    PHB      X      PHB     PHB     0-31    0               N/A
GPU2    PHB     PHB      X      PHB     0-31    0               N/A
GPU3    PHB     PHB     PHB      X      0-31    0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

Error logs:

INFO 04-23 19:58:45 utils.py:598] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2
(RayWorkerWrapper pid=18679) INFO 04-23 19:58:45 utils.py:598] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2
ERROR 04-23 19:58:45 pynccl.py:44] Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .It is expected if you are not running on NVIDIA/AMD GPUs.Otherwise, the nccl library might not exist, be corrupted or it does not support the current platform Linux-4.15.0-136-generic-x86_64-with-glibc2.27.One solution is to download libnccl2 version 2.18 from https://developer.download.nvidia.com/compute/cuda/repos/ and extract the libnccl.so.2 file. If you already have the library, please set the environment variable VLLM_NCCL_SO_PATH to point to the correct nccl library path.
INFO 04-23 19:58:45 pynccl_utils.py:17] Failed to import NCCL library: Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .
INFO 04-23 19:58:45 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
(RayWorkerWrapper pid=18679) ERROR 04-23 19:58:45 pynccl.py:44] Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .It is expected if you are not running on NVIDIA/AMD GPUs.Otherwise, the nccl library might not exist, be corrupted or it does not support the current platform Linux-4.15.0-136-generic-x86_64-with-glibc2.27.One solution is to download libnccl2 version 2.18 from https://developer.download.nvidia.com/compute/cuda/repos/ and extract the libnccl.so.2 file. If you already have the library, please set the environment variable VLLM_NCCL_SO_PATH to point to the correct nccl library path.
(RayWorkerWrapper pid=18679) INFO 04-23 19:58:45 pynccl_utils.py:17] Failed to import NCCL library: Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .
(RayWorkerWrapper pid=18679) INFO 04-23 19:58:45 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
INFO 04-23 19:58:46 selector.py:65] Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO 04-23 19:58:46 selector.py:33] Using XFormers backend.
(RayWorkerWrapper pid=18679) INFO 04-23 19:58:46 selector.py:65] Cannot use FlashAttention backend for Volta and Turing GPUs.
(RayWorkerWrapper pid=18679) INFO 04-23 19:58:46 selector.py:33] Using XFormers backend.
ERROR 04-23 19:58:47 worker_base.py:153] Error executing method init_device. This might cause deadlock in distributed execution.
ERROR 04-23 19:58:47 worker_base.py:153] Traceback (most recent call last):
ERROR 04-23 19:58:47 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker_base.py", line 145, in execute_method
ERROR 04-23 19:58:47 worker_base.py:153]     return executor(*args, **kwargs)
ERROR 04-23 19:58:47 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 110, in init_device
ERROR 04-23 19:58:47 worker_base.py:153]     init_worker_distributed_environment(self.parallel_config, self.rank,
ERROR 04-23 19:58:47 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 301, in init_worker_distributed_environment
ERROR 04-23 19:58:47 worker_base.py:153]     pynccl_utils.init_process_group(
ERROR 04-23 19:58:47 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/distributed/device_communicators/pynccl_utils.py", line 46, in init_process_group
ERROR 04-23 19:58:47 worker_base.py:153]     logger.info(f"vLLM is using nccl=={ncclGetVersion()}")
ERROR 04-23 19:58:47 worker_base.py:153] NameError: name 'ncclGetVersion' is not defined

Reproduce commands:

$ conda create -n vllm-test python=3.10 pip
$ conda activate vllm-test
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ pip install -e .
$ CUDA_VISIBLE_DEVICES=2,3 python -m vllm.entrypoints.openai.api_server --model /data/zhaoxf4/pretrained/meta-llama/Meta-Llama-3-8B-Instruct --dtype half --tensor-parallel-size 2

By the way, this problem does not occur in single-card inference. I have searched for similar issues and reinstalled the environment many times as described in #4257, but it did not take effect.

youkaichao commented 5 months ago

run with export NCCL_DEBUG=TRACE ? this might be a nccl problem.

zhaoxf4 commented 5 months ago

run with export NCCL_DEBUG=TRACE ? this might be a nccl problem.

After the command is executed, the complete log is as follows:

(vllm-test) $ export NCCL_DEBUG=TRACE

(vllm-test) $ CUDA_VISIBLE_DEVICES=2,3 python -m vllm.entrypoints.openai.api_server --model /data/zhaoxf4/pretrained/meta-llama/Meta-Llama-3-8B-Instruct --dtype half --tensor-parallel-size 2
INFO 04-24 09:39:20 api_server.py:151] vLLM API server version 0.4.1
INFO 04-24 09:39:20 api_server.py:152] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model='/data/zhaoxf4/pretrained/meta-llama/Meta-Llama-3-8B-Instruct', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='half', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=5, disable_log_stats=False, quantization=None, enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', image_input_type=None, image_token_id=None, image_input_shape=None, image_feature_size=None, scheduler_delay_factor=0.0, enable_chunked_prefill=False, speculative_model=None, num_speculative_tokens=None, model_loader_extra_config=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
WARNING 04-24 09:39:20 config.py:948] Casting torch.bfloat16 to torch.float16.
2024-04-24 09:39:22,770 INFO worker.py:1749 -- Started a local Ray instance.
INFO 04-24 09:39:23 llm_engine.py:98] Initializing an LLM engine (v0.4.1) with config: model='/data/zhaoxf4/pretrained/meta-llama/Meta-Llama-3-8B-Instruct', speculative_config=None, tokenizer='/data/zhaoxf4/pretrained/meta-llama/Meta-Llama-3-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 04-24 09:39:27 utils.py:598] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2
ERROR 04-24 09:39:27 pynccl.py:44] Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .It is expected if you are not running on NVIDIA/AMD GPUs.Otherwise, the nccl library might not exist, be corrupted or it does not support the current platform Linux-4.15.0-136-generic-x86_64-with-glibc2.27.One solution is to download libnccl2 version 2.18 from https://developer.download.nvidia.com/compute/cuda/repos/ and extract the libnccl.so.2 file. If you already have the library, please set the environment variable VLLM_NCCL_SO_PATH to point to the correct nccl library path.
INFO 04-24 09:39:27 pynccl_utils.py:17] Failed to import NCCL library: Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .
INFO 04-24 09:39:27 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 utils.py:598] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:27 pynccl.py:44] Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .It is expected if you are not running on NVIDIA/AMD GPUs.Otherwise, the nccl library might not exist, be corrupted or it does not support the current platform Linux-4.15.0-136-generic-x86_64-with-glibc2.27.One solution is to download libnccl2 version 2.18 from https://developer.download.nvidia.com/compute/cuda/repos/ and extract the libnccl.so.2 file. If you already have the library, please set the environment variable VLLM_NCCL_SO_PATH to point to the correct nccl library path.
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 pynccl_utils.py:17] Failed to import NCCL library: Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
INFO 04-24 09:39:27 selector.py:65] Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO 04-24 09:39:27 selector.py:33] Using XFormers backend.
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 selector.py:65] Cannot use FlashAttention backend for Volta and Turing GPUs.
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 selector.py:33] Using XFormers backend.
ERROR 04-24 09:39:29 worker_base.py:153] Error executing method init_device. This might cause deadlock in distributed execution.
ERROR 04-24 09:39:29 worker_base.py:153] Traceback (most recent call last):
ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker_base.py", line 145, in execute_method
ERROR 04-24 09:39:29 worker_base.py:153]     return executor(*args, **kwargs)
ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 110, in init_device
ERROR 04-24 09:39:29 worker_base.py:153]     init_worker_distributed_environment(self.parallel_config, self.rank,
ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 301, in init_worker_distributed_environment
ERROR 04-24 09:39:29 worker_base.py:153]     pynccl_utils.init_process_group(
ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/distributed/device_communicators/pynccl_utils.py", line 46, in init_process_group
ERROR 04-24 09:39:29 worker_base.py:153]     logger.info(f"vLLM is using nccl=={ncclGetVersion()}")
ERROR 04-24 09:39:29 worker_base.py:153] NameError: name 'ncclGetVersion' is not defined
Traceback (most recent call last):
  File "/home/zhaoxf4/miniconda3/envs/vllm-test/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/zhaoxf4/miniconda3/envs/vllm-test/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/data/zhaoxf4/API/llama3/vllm/vllm/entrypoints/openai/api_server.py", line 159, in <module>
    engine = AsyncLLMEngine.from_engine_args(
  File "/data/zhaoxf4/API/llama3/vllm/vllm/engine/async_llm_engine.py", line 361, in from_engine_args
    engine = cls(
  File "/data/zhaoxf4/API/llama3/vllm/vllm/engine/async_llm_engine.py", line 319, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/data/zhaoxf4/API/llama3/vllm/vllm/engine/async_llm_engine.py", line 437, in _init_engine
    return engine_class(*args, **kwargs)
  File "/data/zhaoxf4/API/llama3/vllm/vllm/engine/llm_engine.py", line 148, in __init__
    self.model_executor = executor_class(
  File "/data/zhaoxf4/API/llama3/vllm/vllm/executor/executor_base.py", line 41, in __init__
    self._init_executor()
  File "/data/zhaoxf4/API/llama3/vllm/vllm/executor/ray_gpu_executor.py", line 44, in _init_executor
    self._init_workers_ray(placement_group)
  File "/data/zhaoxf4/API/llama3/vllm/vllm/executor/ray_gpu_executor.py", line 181, in _init_workers_ray
    self._run_workers("init_device")
  File "/data/zhaoxf4/API/llama3/vllm/vllm/executor/ray_gpu_executor.py", line 323, in _run_workers
    driver_worker_output = self.driver_worker.execute_method(
  File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker_base.py", line 154, in execute_method
    raise e
  File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker_base.py", line 145, in execute_method
    return executor(*args, **kwargs)
  File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 110, in init_device
    init_worker_distributed_environment(self.parallel_config, self.rank,
  File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 301, in init_worker_distributed_environment
    pynccl_utils.init_process_group(
  File "/data/zhaoxf4/API/llama3/vllm/vllm/distributed/device_communicators/pynccl_utils.py", line 46, in init_process_group
    logger.info(f"vLLM is using nccl=={ncclGetVersion()}")
NameError: name 'ncclGetVersion' is not defined
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] Error executing method init_device. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] Traceback (most recent call last):
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker_base.py", line 145, in execute_method
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]     return executor(*args, **kwargs)
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 110, in init_device
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]     init_worker_distributed_environment(self.parallel_config, self.rank,
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 301, in init_worker_distributed_environment
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]     pynccl_utils.init_process_group(
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/distributed/device_communicators/pynccl_utils.py", line 46, in init_process_group
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]     logger.info(f"vLLM is using nccl=={ncclGetVersion()}")
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] NameError: name 'ncclGetVersion' is not defined

Looks like no new logs have been added. I found out that my libnccl version is 2.15.1. Is this version likely to have an impact on multi-card communication?

(vllm-test) $ ll /usr/lib/x86_64-linux-gnu/libnccl.so.2
lrwxrwxrwx 1 root root 17 Sep 20  2022 /usr/lib/x86_64-linux-gnu/libnccl.so.2 -> libnccl.so.2.15.1*
youkaichao commented 5 months ago

[pip3] vllm-nccl-cu12==2.18.1.0.3.0

If you have vllm-nccl-cu12 installed, you don't need to specify VLLM_NCCL_SO_PATH . It should just work.

zhaoxf4 commented 5 months ago

[pip3] vllm-nccl-cu12==2.18.1.0.3.0

If you have vllm-nccl-cu12 installed, you don't need to specify VLLM_NCCL_SO_PATH . It should just work.

PR #4259 can avoid this problem, but should just ignore the error. I'll try to reproduce this problem on another machine or docker.

kylejablon commented 5 months ago

Just chiming in to say I'm experiencing a similar issue. Perhaps this is just an issue issue with how my directories are being set up. I have vllm-nccl-cu12==2.18.1.0.4.0 installed, and it's still having trouble finding libnccl.

$ find . | grep libnccl

./usr/local/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2

Digging through the latest version of find_nccl_library seemed to confirm the issue for me

>>> find_nccl_library()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 18, in find_nccl_library
  File "<stdin>", line 25, in find_library
ValueError: Cannot find libnccl.so.2 in the system.

I think manually passing in libnccl may be the only feasible solution

youkaichao commented 5 months ago

vllm-nccl-cu12==2.18.1.0.4.0 does not install in ./usr/local/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 . It should be in ~/.config/vllm/nccl/cu12/libnccl.so.2.18.1 . The home directory depends on which user installs it. It is possible that your current user is different from the user who installed it.

./usr/local/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 is the library installed by pytorch, and has a problem of increased memory used reported at https://github.com/NVIDIA/nccl/issues/1234 . That's why vllm need to change the nccl dependency.

Unless either https://github.com/NVIDIA/nccl/issues/1234 or https://github.com/pypi/support/issues/3792 is resolved, we have no choice but to bring libnccl.so this way. Sorry for the trouble. This is not what we want, either. We also hope to manage dependency in a standard pip way.