oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
39.58k stars 5.2k forks source link

AttributeError when loading the "ggml-model-q4_0.bin" model with a lora adapter #1151

Closed Teragron closed 1 year ago

Teragron commented 1 year ago

Describe the bug

I've fine-tuned the "decapoda-research/llama-7b-hf" model on a cloud GPU and got my adapter_model.bin and adapter_config.json files. What I want now is to run the quantized 4-bit model (ggml) with this adapter model. I've tried doing this using the following command: python server.py --model lama_7b --lora rv2 --cpu and got the following error: AttributeError: 'LlamaCppModel' object has no attribute 'config'

Is there an existing issue for this?

Reproduction

python server.py --model lama_7b --lora rv2 --cpu

Screenshot

No response

Logs

Loading lama_7b...
llama.cpp weights detected: models\lama_7b\ggml-model-q4_0.bin

llama.cpp: loading model from models\lama_7b\ggml-model-q4_0.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format     = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: f16        = 2
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 2052.00 MB per state)
...................................................................................................
.
llama_init_from_file: kv self size  = 2048.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Adding the LoRA rv2 to the model...
Traceback (most recent call last):
  File "C:\Users\ahmet\Desktop\Lamatuning\oobabooga-windows\text-generation-webui\server.py", line 473, in <module>
    add_lora_to_model(shared.args.lora)
  File "C:\Users\ahmet\Desktop\Lamatuning\oobabooga-windows\text-generation-webui\modules\LoRA.py", line 28, in add_lora_to_model
    shared.model = PeftModel.from_pretrained(shared.model, Path(f"{shared.args.lora_dir}/{lora_name}"), **params)
  File "C:\Users\ahmet\Desktop\Lamatuning\oobabooga-windows\installer_files\env\lib\site-packages\peft\peft_model.py", line 143, in from_pretrained
    model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config)
  File "C:\Users\ahmet\Desktop\Lamatuning\oobabooga-windows\installer_files\env\lib\site-packages\peft\peft_model.py", line 514, in __init__
    super().__init__(model, peft_config)
  File "C:\Users\ahmet\Desktop\Lamatuning\oobabooga-windows\installer_files\env\lib\site-packages\peft\peft_model.py", line 74, in __init__
    self.config = self.base_model.config
AttributeError: 'LlamaCppModel' object has no attribute 'config'

System Info

Windows 10, I7-9.Gen - GTX 1660 Ti-Max-Q
github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

Simplegram commented 1 year ago

I have the same issue doing perplexity evaluation with TheBloke/guanaco-33B-GGML.