Open userandpass opened 5 months ago
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.6 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.29.3 Libc version: glibc-2.27
Python version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-4.15.0-213-generic-x86_64-with-glibc2.27 Is CUDA available: True CUDA runtime version: 12.1.66 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla T4 GPU 1: Tesla T4 GPU 2: Tesla T4 GPU 3: Tesla T4 GPU 4: Tesla T4 GPU 5: Tesla T4 GPU 6: Tesla T4 GPU 7: Tesla T4
Nvidia driver version: 530.30.02
🐛 Describe the bug
I changed "torch_dtype" to "float16" in the model configuration file 'config.json'
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m vllm.entrypoints.openai.api_server --model /data/deepseek/deepseek-coder-6.7b-base --served-model-name deepseek --tensor-parallel-size 4 --port 1101
(RayWorkerWrapper pid=28810) ERROR 05-21 12:53:01 worker_base.py:145] Error executing method load_model. This might cause deadlock in distributed execution. (RayWorkerWrapper pid=28810) ERROR 05-21 12:53:01 worker_base.py:145] Traceback (most recent call last): ..... (RayWorkerWrapper pid=28810) ERROR 05-21 12:53:01 worker_base.py:145] self.model_runner.load_model() ERROR 05-21 12:53:01 worker_base.py:145] Error executing method load_model. This might cause deadlock in distributed execution. ...... ERROR 05-21 12:53:01 worker_base.py:145] magic_number = pickle_module.load(f, **pickle_load_args) ERROR 05-21 12:53:01 worker_base.py:145] _pickle.UnpicklingError: invalid load key, 'v'. ...... [rank0]: _pickle.UnpicklingError: invalid load key, 'v'. [rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]