vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.11k stars 4.55k forks source link

[Bug]: When cuda_graph is enabled, RunTimeError:NCCL error is reported using nvidia-nccl-cu11 #5679

Open askcs517 opened 4 months ago

askcs517 commented 4 months ago

Your current environment

1、 torch 2.3.0+cu118 vllm 0.4.3+cu118 2、 [root@master1 v2]# pip show torch Name: torch Version: 2.3.0+cu118 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: /opt/anaconda3/envs/vllm4/lib/python3.10/site-packages Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu11, nvidia-cuda-cupti-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cufft-cu11, nvidia-curand-cu11, nvidia-cusolver-cu11, nvidia-cusparse-cu11, nvidia-nccl-cu11, nvidia-nvtx-cu11, sympy, triton, typing-extensions 3、(vllm4) [root@master1 v2]# pip list |grep nccl nvidia-nccl-cu11 2.20.5

🐛 Describe the bug

When cuda_graph is enabled, RunTimeError:NCCL error is reported using nvidia-nccl-cu11 pip install nvidia-nccl-cu12 is ok.

askcs517 commented 4 months ago

Is cuda_graph dependent on nvidia-nccl-cu12?

youkaichao commented 4 months ago

When cuda_graph is enabled, RunTimeError:NCCL error is reported using nvidia-nccl-cu11

then this is a nccl bug, you need to report it to https://github.com/NVIDIA/nccl .

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!