vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.24k stars 4.58k forks source link

[Bug]: Error executing method load_model. This might cause deadlock in distributed execution. #4946

Open userandpass opened 5 months ago

userandpass commented 5 months ago

Your current environment

The output of `python collect_env.py`

PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.29.3 Libc version: glibc-2.27

Python version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-4.15.0-213-generic-x86_64-with-glibc2.27 Is CUDA available: True CUDA runtime version: 12.1.66 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla T4 GPU 1: Tesla T4 GPU 2: Tesla T4 GPU 3: Tesla T4 GPU 4: Tesla T4 GPU 5: Tesla T4 GPU 6: Tesla T4 GPU 7: Tesla T4

Nvidia driver version: 530.30.02

🐛 Describe the bug

I changed "torch_dtype" to "float16" in the model configuration file 'config.json'

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m vllm.entrypoints.openai.api_server --model /data/deepseek/deepseek-coder-6.7b-base --served-model-name deepseek --tensor-parallel-size 4 --port 1101

(RayWorkerWrapper pid=28810) ERROR 05-21 12:53:01 worker_base.py:145] Error executing method load_model. This might cause deadlock in distributed execution. (RayWorkerWrapper pid=28810) ERROR 05-21 12:53:01 worker_base.py:145] Traceback (most recent call last): ..... (RayWorkerWrapper pid=28810) ERROR 05-21 12:53:01 worker_base.py:145] self.model_runner.load_model() ERROR 05-21 12:53:01 worker_base.py:145] Error executing method load_model. This might cause deadlock in distributed execution. ...... ERROR 05-21 12:53:01 worker_base.py:145] magic_number = pickle_module.load(f, **pickle_load_args) ERROR 05-21 12:53:01 worker_base.py:145] _pickle.UnpicklingError: invalid load key, 'v'. ...... [rank0]: _pickle.UnpicklingError: invalid load key, 'v'. [rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!