File "/usr/local/lib/python3.8/site-packages/vllm/entrypoints/llm.py", line 118, in __init__
self.llm_engine = LLMEngine.from_engine_args(
File "/usr/local/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 277, in from_engine_args
engine = cls(
File "/usr/local/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 148, in __init__
self.model_executor = executor_class(
File "/usr/local/lib/python3.8/site-packages/vllm/executor/executor_base.py", line 41, in __init__
self._init_executor()
File "/usr/local/lib/python3.8/site-packages/vllm/executor/gpu_executor.py", line 22, in _init_executor
self._init_non_spec_worker()
File "/usr/local/lib/python3.8/site-packages/vllm/executor/gpu_executor.py", line 51, in _init_non_spec_worker
self.driver_worker.load_model()
File "/usr/local/lib/python3.8/site-packages/vllm/worker/worker.py", line 117, in load_model
self.model_runner.load_model()
File "/usr/local/lib/python3.8/site-packages/vllm/worker/model_runner.py", line 162, in load_model
self.model = get_model(
File "/usr/local/lib/python3.8/site-packages/vllm/model_executor/model_loader/__init__.py", line 19, in get_model
return loader.load_model(model_config=model_config,
File "/usr/local/lib/python3.8/site-packages/vllm/model_executor/model_loader/loader.py", line 225, in load_model
model.load_weights(
File "/usr/local/lib/python3.8/site-packages/vllm/model_executor/models/llama.py", line 411, in load_weights
param = params_dict[name]
KeyError: 'base_model.model.model.layers.0.mlp.down_proj.lora_A.weight'
Seems only the base model is loaded, but the lora adapter is ignored.
[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Your current environment
How would you like to use vllm
I have a model that is finetuned and saved:
And I can load the model locally by
What is the best way to use vllm to inference on this model? I meet the following exception when trying to call
Exception:
Seems only the base model is loaded, but the lora adapter is ignored.
If I print out the model inside vllm:
Before submitting a new issue...