I've fine-tuned the "decapoda-research/llama-7b-hf" model on a cloud GPU and got my adapter_model.bin and adapter_config.json files. What I want now is to run the quantized 4-bit model (ggml) with this adapter model.
I've tried doing this using the following command:
python server.py --model lama_7b --lora rv2 --cpu and got the following error: AttributeError: 'LlamaCppModel' object has no attribute 'config'
Is there an existing issue for this?
[X] I have searched the existing issues
Reproduction
python server.py --model lama_7b --lora rv2 --cpu
Screenshot
No response
Logs
Loading lama_7b...
llama.cpp weights detected: models\lama_7b\ggml-model-q4_0.bin
llama.cpp: loading model from models\lama_7b\ggml-model-q4_0.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: f16 = 2
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 2052.00 MB per state)
...................................................................................................
.
llama_init_from_file: kv self size = 2048.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Adding the LoRA rv2 to the model...
Traceback (most recent call last):
File "C:\Users\ahmet\Desktop\Lamatuning\oobabooga-windows\text-generation-webui\server.py", line 473, in <module>
add_lora_to_model(shared.args.lora)
File "C:\Users\ahmet\Desktop\Lamatuning\oobabooga-windows\text-generation-webui\modules\LoRA.py", line 28, in add_lora_to_model
shared.model = PeftModel.from_pretrained(shared.model, Path(f"{shared.args.lora_dir}/{lora_name}"), **params)
File "C:\Users\ahmet\Desktop\Lamatuning\oobabooga-windows\installer_files\env\lib\site-packages\peft\peft_model.py", line 143, in from_pretrained
model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config)
File "C:\Users\ahmet\Desktop\Lamatuning\oobabooga-windows\installer_files\env\lib\site-packages\peft\peft_model.py", line 514, in __init__
super().__init__(model, peft_config)
File "C:\Users\ahmet\Desktop\Lamatuning\oobabooga-windows\installer_files\env\lib\site-packages\peft\peft_model.py", line 74, in __init__
self.config = self.base_model.config
AttributeError: 'LlamaCppModel' object has no attribute 'config'
Describe the bug
I've fine-tuned the "decapoda-research/llama-7b-hf" model on a cloud GPU and got my adapter_model.bin and adapter_config.json files. What I want now is to run the quantized 4-bit model (ggml) with this adapter model. I've tried doing this using the following command:
python server.py --model lama_7b --lora rv2 --cpu
and got the following error: AttributeError: 'LlamaCppModel' object has no attribute 'config'Is there an existing issue for this?
Reproduction
python server.py --model lama_7b --lora rv2 --cpu
Screenshot
No response
Logs
System Info