vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.26k stars 4.58k forks source link

[Bug]: deploy multi lora by vllm mode error #7946

Open askcs517 opened 2 months ago

askcs517 commented 2 months ago

Your current environment

Env: cuda11.8 vllm 0.4.3

🐛 Describe the bug

in T4 and vllm version==0.4.3 deploy multi lora by vllm failed, error info: RuntimeError: CUDA error: no kernel image is available for execution on the device. my deploy command: CUDA_VISIBLE_DEVICES=0,1,2,3 swift deploy --tensor_parallel_size 4 --dtype fp16 --model_type qwen1half-7b-chat --model_id_or_path /cloud/user/data/data0806/llm/M2/Chat_New --ckpt_dir /cloud/user/data/data0806/llm/M2/checkpoint-200/ --infer_back vllm -- vllm_enable_lora true --max_model_len 512 --enforce_eager I tried to vllm to 0.5.5, there is still error

Before submitting a new issue...

DarkLight1337 commented 2 months ago

Try upgrading your CUDA version. CUDA 11.8 may be too old.