Open askcs517 opened 4 months ago
Is cuda_graph dependent on nvidia-nccl-cu12?
When cuda_graph is enabled, RunTimeError:NCCL error is reported using nvidia-nccl-cu11
then this is a nccl bug, you need to report it to https://github.com/NVIDIA/nccl .
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
1、 torch 2.3.0+cu118 vllm 0.4.3+cu118 2、 [root@master1 v2]# pip show torch Name: torch Version: 2.3.0+cu118 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: /opt/anaconda3/envs/vllm4/lib/python3.10/site-packages Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu11, nvidia-cuda-cupti-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cufft-cu11, nvidia-curand-cu11, nvidia-cusolver-cu11, nvidia-cusparse-cu11, nvidia-nccl-cu11, nvidia-nvtx-cu11, sympy, triton, typing-extensions 3、(vllm4) [root@master1 v2]# pip list |grep nccl nvidia-nccl-cu11 2.20.5
🐛 Describe the bug
When cuda_graph is enabled, RunTimeError:NCCL error is reported using nvidia-nccl-cu11 pip install nvidia-nccl-cu12 is ok.