vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
22.46k stars 3.17k forks source link

[Bug]: When cuda_graph is enabled, RunTimeError:NCCL error is reported using nvidia-nccl-cu11 #5679

Open askcs517 opened 2 weeks ago

askcs517 commented 2 weeks ago

Your current environment

1、 torch 2.3.0+cu118 vllm 0.4.3+cu118 2、 [root@master1 v2]# pip show torch Name: torch Version: 2.3.0+cu118 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: /opt/anaconda3/envs/vllm4/lib/python3.10/site-packages Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu11, nvidia-cuda-cupti-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cufft-cu11, nvidia-curand-cu11, nvidia-cusolver-cu11, nvidia-cusparse-cu11, nvidia-nccl-cu11, nvidia-nvtx-cu11, sympy, triton, typing-extensions 3、(vllm4) [root@master1 v2]# pip list |grep nccl nvidia-nccl-cu11 2.20.5

🐛 Describe the bug

When cuda_graph is enabled, RunTimeError:NCCL error is reported using nvidia-nccl-cu11 pip install nvidia-nccl-cu12 is ok.

askcs517 commented 2 weeks ago

Is cuda_graph dependent on nvidia-nccl-cu12?

youkaichao commented 2 weeks ago

When cuda_graph is enabled, RunTimeError:NCCL error is reported using nvidia-nccl-cu11

then this is a nccl bug, you need to report it to https://github.com/NVIDIA/nccl .