vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.04k stars 4.54k forks source link

[Bug]: 英伟达最新驱动555.85,vllm运行报错 #5035

Open gaye746560359 opened 5 months ago

gaye746560359 commented 5 months ago

2024-05-24 23:49:38 WARNING 05-24 15:49:38 utils.py:327] Not found nvcc in /usr/local/cuda. Skip cuda version check! 2024-05-24 23:49:38 INFO 05-24 15:49:38 config.py:379] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. But it may cause slight accuracy drop without scaling factors. FP8_E5M2 (without scaling) is only supported on cuda version greater than 11.8. On ROCm (AMD GPU), FP8_E4M3 is instead supported for common inference criteria. 2024-05-24 23:49:38 WARNING 05-24 15:49:38 config.py:405] Possibly too large swap space. 4.00 GiB out of the 9.71 GiB total CPU memory is allocated for the swap space. 2024-05-24 23:49:38 INFO 05-24 15:49:38 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: model='shenzhi-wang/Llama3-8B-Chinese-Chat', speculative_config=None, tokenizer='shenzhi-wang/Llama3-8B-Chinese-Chat', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=fp8, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=gpt-3.5-turbo) 2024-05-24 23:49:39 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-05-24 23:49:39 INFO 05-24 15:49:39 utils.py:660] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 2024-05-24 23:49:39 WARNING 05-24 15:49:39 utils.py:465] Using 'pin_memory=False' as WSL is detected. This may slow down the performance. 2024-05-24 23:49:39 Traceback (most recent call last): 2024-05-24 23:49:39 File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main 2024-05-24 23:49:39 return _run_code(code, main_globals, None, 2024-05-24 23:49:39 File "/usr/lib/python3.10/runpy.py", line 86, in _run_code 2024-05-24 23:49:39 exec(code, run_globals) 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 168, in <module> 2024-05-24 23:49:39 engine = AsyncLLMEngine.from_engine_args( 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 366, in from_engine_args 2024-05-24 23:49:39 engine = cls( 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 324, in __init__ 2024-05-24 23:49:39 self.engine = self._init_engine(*args, **kwargs) 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine 2024-05-24 23:49:39 return engine_class(*args, **kwargs) 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 160, in __init__ 2024-05-24 23:49:39 self.model_executor = executor_class( 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 41, in __init__ 2024-05-24 23:49:39 self._init_executor() 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 23, in _init_executor 2024-05-24 23:49:39 self._init_non_spec_worker() 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 67, in _init_non_spec_worker 2024-05-24 23:49:39 self.driver_worker = self._create_worker() 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 59, in _create_worker 2024-05-24 23:49:39 wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank, 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 131, in init_worker 2024-05-24 23:49:39 self.worker = worker_class(*args, **kwargs) 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 73, in __init__ 2024-05-24 23:49:39 self.model_runner = ModelRunner( 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 145, in __init__ 2024-05-24 23:49:39 self.attn_backend = get_attn_backend( 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/attention/selector.py", line 25, in get_attn_backend 2024-05-24 23:49:39 backend = _which_attn_to_use(dtype) 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/attention/selector.py", line 67, in _which_attn_to_use 2024-05-24 23:49:39 if torch.cuda.get_device_capability()[0] < 8: 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 430, in get_device_capability 2024-05-24 23:49:39 prop = get_device_properties(device) 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 444, in get_device_properties 2024-05-24 23:49:39 _lazy_init() # will define _get_device_properties 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 293, in _lazy_init 2024-05-24 23:49:39 torch._C._cuda_init() 2024-05-24 23:49:39 RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found

HelloCard commented 5 months ago

wsl以及cudaGetDeviceCount错误……我遇到过一个类似的故障,原因是wsl外面安装了向日葵远程工具,向日葵创建的虚拟显卡干扰了wsl里面的vllm。

gaye746560359 commented 5 months ago

wsl以及cudaGetDeviceCount错误……我遇到过一个类似的故障,原因是wsl外面安装了向日葵远程工具,向日葵创建的虚拟显卡干扰了wsl里面的vllm。

我卸载了向日葵,运行还是报错

MarioLiebisch commented 5 months ago

Stumbled over this issue while looking around to see if there have been any fixes.

I just checked the Nvidia driver feedback thread and it's actually a listed known issue:

PyTorch-CUDA Docker not compatible with CUDA 12.5/GRD 555.85 [4668302]

cliffwoolley commented 5 months ago

Please see https://github.com/NVIDIA/nvidia-container-toolkit/issues/520 .

gaye746560359 commented 5 months ago

请参阅NVIDIA/nvidia-container-toolkit#520

Docker Desktop 时遇到此症状,则表示正在修复(升级捆绑的 nvidia-container-toolkit);预计什么时候修复更新?

cliffwoolley commented 5 months ago

请参阅NVIDIA/nvidia-container-toolkit#520

Docker Desktop 时遇到此症状,则表示正在修复(升级捆绑的 nvidia-container-toolkit);预计什么时候修复更新?

We are moving as quickly with it as we can, but I don't have an ETA yet, which is why I didn't list one in the other issue.

cliffwoolley commented 5 months ago

Docker Desktop 4.31 was released yesterday and includes NVIDIA Container Toolkit 1.15.0, which resolves this issue.

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!