vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.95k stars 4.52k forks source link

NCCL cannot be captured in a graph #3069

Closed zhouzj1610 closed 5 months ago

zhouzj1610 commented 8 months ago

用一张卡跑7b模型不会报错,用两张卡时报NCCL错误。vllm版本v0.3.2。

command: python -m vllm.entrypoints.openai.api_server --model models/Qwen-14B-Chat --trust-remote-code --port 8509 --served-model-name qwen-14b-chat --dtype float16 --tensor-parallel-size 2

NCCL:NCCL version 2.19.3+cuda11.0 CUDA:11.6

error: NCCL WARN NCCL cannot be captured in a graph if either it wasn't built with CUDA runtime >= 11.3 or if the installed CUDA driver < R465.

看了nvidia官网上nccl的版本,只有三个: image

zhouzj1610 commented 8 months ago

终于解决了。

原因: 用pip install xformers --index-url https://download.pytorch.org/whl/cu118安装xformers,会默认安装xformers-0.0.24+cu118、2.2.0+cu118和NCCL version 2.19.3+cuda11.0,这个NCCL跟我的cuda11.6版本不兼容。

解决办法: 重装虚拟环境,用pip install xformers==0.0.23 --index-url https://download.pytorch.org/whl/cu118安装xformers和torch,会自动安装NCCL version 2.18.6+cuda11.8,跟cuda11.6版本兼容。 可以用多卡起服务了。