[Bug]: Error executing method load_model. This might cause deadlock in distributed execution.

Your current environment

The output of `python collect_env.py`

PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.29.3 Libc version: glibc-2.27

Python version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-4.15.0-213-generic-x86_64-with-glibc2.27 Is CUDA available: True CUDA runtime version: 12.1.66 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla T4 GPU 1: Tesla T4 GPU 2: Tesla T4 GPU 3: Tesla T4 GPU 4: Tesla T4 GPU 5: Tesla T4 GPU 6: Tesla T4 GPU 7: Tesla T4

Nvidia driver version: 530.30.02

🐛 Describe the bug

I changed "torch_dtype" to "float16" in the model configuration file 'config.json'

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m vllm.entrypoints.openai.api_server --model /data/deepseek/deepseek-coder-6.7b-base --served-model-name deepseek --tensor-parallel-size 4 --port 1101

(RayWorkerWrapper pid=28810) ERROR 05-21 12:53:01 worker_base.py:145] Error executing method load_model. This might cause deadlock in distributed execution. (RayWorkerWrapper pid=28810) ERROR 05-21 12:53:01 worker_base.py:145] Traceback (most recent call last): ..... (RayWorkerWrapper pid=28810) ERROR 05-21 12:53:01 worker_base.py:145] self.model_runner.load_model() ERROR 05-21 12:53:01 worker_base.py:145] Error executing method load_model. This might cause deadlock in distributed execution. ...... ERROR 05-21 12:53:01 worker_base.py:145] magic_number = pickle_module.load(f, **pickle_load_args) ERROR 05-21 12:53:01 worker_base.py:145] _pickle.UnpicklingError: invalid load key, 'v'. ...... [rank0]: _pickle.UnpicklingError: invalid load key, 'v'. [rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

vllm-project / vllm

[Bug]: Error executing method load_model. This might cause deadlock in distributed execution. #4946

Your current environment

🐛 Describe the bug