in T4 and vllm version==0.4.3
deploy multi lora by vllm failed, error info:
RuntimeError: CUDA error: no kernel image is available for execution on the device.
my deploy command:
CUDA_VISIBLE_DEVICES=0,1,2,3 swift deploy --tensor_parallel_size 4 --dtype fp16 --model_type qwen1half-7b-chat
--model_id_or_path /cloud/user/data/data0806/llm/M2/Chat_New
--ckpt_dir /cloud/user/data/data0806/llm/M2/checkpoint-200/
--infer_back vllm -- vllm_enable_lora true --max_model_len 512 --enforce_eager
I tried to vllm to 0.5.5, there is still error
Before submitting a new issue...
[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Your current environment
🐛 Describe the bug
in T4 and vllm version==0.4.3 deploy multi lora by vllm failed, error info: RuntimeError: CUDA error: no kernel image is available for execution on the device. my deploy command: CUDA_VISIBLE_DEVICES=0,1,2,3 swift deploy --tensor_parallel_size 4 --dtype fp16 --model_type qwen1half-7b-chat --model_id_or_path /cloud/user/data/data0806/llm/M2/Chat_New --ckpt_dir /cloud/user/data/data0806/llm/M2/checkpoint-200/ --infer_back vllm -- vllm_enable_lora true --max_model_len 512 --enforce_eager I tried to vllm to 0.5.5, there is still error
Before submitting a new issue...