Trying to run multi-GPU inference with this image, I get the below error:
INFO 04-23 19:30:01 pynccl_utils.py:17] Failed to import NCCL library: libnccl.so.2: cannot open shared object file: No such file or directory
INFO 04-23 19:30:01 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
INFO 04-23 19:30:03 selector.py:16] Using FlashAttention backend.
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:04 pynccl.py:53] Failed to load NCCL library from libnccl.so.2 .It is expected if you are not running on NVIDIA/AMD GPUs.Otherwise please set the environment variable VLLM_NCCL_SO_PATH to point to the correct nccl library path.
(RayWorkerVllm pid=1209) INFO 04-23 19:30:04 pynccl_utils.py:17] Failed to import NCCL library: libnccl.so.2: cannot open shared object file: No such file or directory
(RayWorkerVllm pid=1209) INFO 04-23 19:30:04 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
(RayWorkerVllm pid=1209) INFO 04-23 19:30:05 selector.py:16] Using FlashAttention backend.
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] Error executing method init_device. This might cause deadlock in distributed execution.
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] Traceback (most recent call last):
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/ray_utils.py", line 37, in execute_method
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] return executor(*args, **kwargs)
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] ^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 100, in init_device
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] init_distributed_environment(self.parallel_config, self.rank,
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 287, in init_distributed_environment
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] pynccl_utils.init_process_group(
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/parallel_utils/pynccl_utils.py", line 45, in init_process_group
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] logger.info(f"vLLM is using nccl=={ncclGetVersion()}")
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] ^^^^^^^^^^^^^^
(RayWorkerVllm pid=1209) ERROR 04-23 19:30:05 ray_utils.py:44] NameError: name 'ncclGetVersion' is not defined
Trying to run multi-GPU inference with this image, I get the below error:
Looking at the installed packages, I don't see any libnccl installed. It looks like from https://github.com/rh-aiservices-bu/llm-on-openshift/blob/6864d21fdea52a714078d322b4f7b2bc058fdef6/llm-servers/vllm/Containerfile#L53 that the intention was perhaps to install a matching libnccl, but it just got missed?