Closed lordk911 closed 3 weeks ago
@lordk911 Now VLLM model does not support lora. Will support in next two release versions.
@lordk911 Now VLLM model does not support lora. Will support in next two release versions.
thanks for reply
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 5 days since being marked as stale.
Describe the bug
使用swift通过lora微调的qwen1.5-14b-chat-gptq-int4模型,加载peft-model后推理速度很慢
To Reproduce
To help us to reproduce this bug, please provide information below:
cat swift/output/qwen1half-14b-chat-int4/v2-20240407-163120/checkpoint-1546/default/adapter_config.json
cat swift/output/qwen1half-14b-chat-int4/v2-20240407-163120/checkpoint-1546/configuration.json
script to launch model :
xinference launch -n qwen1.5-chat -u qwen1.5-14B-Chat-SQL -s 14 -f gptq --max_model_len 32000 -e "http://10.9.123.456:9997" --worker-ip 10.9.123.456 --peft-model-path /data/llm-project/swift/output/qwen1half-14b-chat-int4/v2-20240407-163120/checkpoint-1546/default
the log of xinference worker:
if I direct use qwen1.5-chat-gptq-14b-Int4 the log of xinference worker:
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.