Open askcs517 opened 2 weeks ago
Is cuda_graph dependent on nvidia-nccl-cu12?
When cuda_graph is enabled, RunTimeError:NCCL error is reported using nvidia-nccl-cu11
then this is a nccl bug, you need to report it to https://github.com/NVIDIA/nccl .
Your current environment
1、 torch 2.3.0+cu118 vllm 0.4.3+cu118 2、 [root@master1 v2]# pip show torch Name: torch Version: 2.3.0+cu118 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: /opt/anaconda3/envs/vllm4/lib/python3.10/site-packages Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu11, nvidia-cuda-cupti-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cufft-cu11, nvidia-curand-cu11, nvidia-cusolver-cu11, nvidia-cusparse-cu11, nvidia-nccl-cu11, nvidia-nvtx-cu11, sympy, triton, typing-extensions 3、(vllm4) [root@master1 v2]# pip list |grep nccl nvidia-nccl-cu11 2.20.5
🐛 Describe the bug
When cuda_graph is enabled, RunTimeError:NCCL error is reported using nvidia-nccl-cu11 pip install nvidia-nccl-cu12 is ok.