vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
23.37k stars 3.33k forks source link

Does VLLM currently support QWEN LoRa model ? #3201

Open qingjiaozyn opened 4 months ago

qingjiaozyn commented 4 months ago

I use the multi-LoRA for offline inference: sql_lora_path = "/home/zyn/models/slot_lora_gd"

from vllm import LLM, SamplingParams from vllm.lora.request import LoRARequest

llm = LLM(model="/home/models/dem_14b/base", enable_lora=True, trust_remote_code=True)

sampling_params = SamplingParams(temperature=0, max_tokens=256, stop=["[/assistant]"])

prompts = [ "[user] Write a SQL query to answer the question based on the table schema.\n\n context: CREATE TABLE table_name_74 (icao VARCHAR, airport VARCHAR)\n\n question: Name the ICAO for lilongwe international airport [/user] [assistant]", "[user] Write a SQL query to answer the question based on the table schema.\n\n context: CREATE TABLE table_name_11 (nationality VARCHAR, elector VARCHAR)\n\n question: When Anchero Pantaleone was the elector what is under nationality? [/user] [assistant]", ]

outputs = llm.generate(prompts, sampling_params, lora_request=LoRARequest("sql_adapter", 1, sql_lora_path))

llm = LLM(model="/home/models/dem_14b/base",

File "/root/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 109, in init self.llm_engine = LLMEngine.from_engine_args(engine_args) File "/root/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 391, in from_engine_args engine = cls(*engine_configs, File "/root/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 128, in init self._init_workers() File "/root/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 181, in _init_workers self._run_workers("load_model") File "/root/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1041, in _run_workers driver_worker_output = getattr(self.driver_worker, File "/root/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/worker/worker.py", line 100, in load_model self.model_runner.load_model() File "/root/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 88, in load_model self.model = get_model(self.model_config, File "/root/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/model_executor/utils.py", line 52, in get_model return get_model_fn(model_config, device_config, **kwargs) File "/root/miniconda3/envs/qwen/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 73, in get_model raise ValueError( ValueError: Model QWenLMHeadModel does not support LoRA, but LoRA is enabled. Support for this model may be added in the future. If this is important to you, please open an issue on github.

jeejeelee commented 4 months ago

As shown in the error message, not supported

pinbop commented 4 months ago

May I ask when this problem will be resolved and is there a plan

chekakaa commented 4 months ago

Hey, I have also encountered the same problem as you. May I ask when I can integrate qwen lora model loading? I would be appreciated if you solved this problem and let me know. I‘m concerned about this problem at present, thx.

BillFang12 commented 4 months ago

I also encountered the same situation, when can qwen support lora model

mali404 commented 4 months ago

Pl enable support for this. @vllm team