Target module Autograd4bitQuantLinear() is not supported

richardburleigh commented 1 year ago

I'm getting the following error when trying to load a model using load_llama_model_4bit_low_ram_and_offload . Any ideas?

Target module Autograd4bitQuantLinear() is not supported. Currently, only torch.nn.Linear and Conv1D are supported.

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ alpaca_lora_4bit/inference.py:11 in <module>                                                     │
│                                                                                                  │
│    8 model_path = '/LLaMA/7B/llama-7b-4bit.pt'                                                   │
│    9 lora_path = 'alpaca_lora_4bit/test_remote/test/checkpoin                                    │
│   10 #model, tokenizer = load_llama_model_4bit_low_ram(config_path, model_path, groupsize=-1)    │
│ ❱ 11 model, tokenizer = load_llama_model_4bit_low_ram_and_offload(config_path, model_path, lo    │
│   12                                                                                             │
│   13 print('Fitting 4bit scales and zeros to half')                                              │
│   14 model.half()                                                                                │
│                                                                                                  │
│ alpaca_lora_4bit/autograd_4bit.py:254 in                                                         │
│ load_llama_model_4bit_low_ram_and_offload                                                        │
│                                                                                                  │
│   251 │   if lora_path is not None:                                                              │
│   252 │   │   from peft import PeftModel                                                         │
│   253 │   │   from monkeypatch.peft_tuners_lora_monkey_patch import Linear4bitLt                 │
│ ❱ 254 │   │   model = PeftModel.from_pretrained(model, lora_path, device_map={'': 'cpu'}, torc   │
│   255 │   │   print(Style.BRIGHT + Fore.GREEN + '{} Lora Applied.'.format(lora_path))            │
│   256 │                                                                                          │
│   257 │   model.seqlen = seqlen                                                                  │
│                                                                                                  │
│ python3.10/site-packages/peft/peft_model.py:180 in from_pretrained                               │
│                                                                                                  │
│    177 │   │   if config.task_type not in MODEL_TYPE_TO_PEFT_MODEL_MAPPING.keys():               │
│    178 │   │   │   model = cls(model, config, adapter_name)                                      │
│    179 │   │   else:                                                                             │
│ ❱  180 │   │   │   model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config, ad  │
│    181 │   │   model.load_adapter(model_id, adapter_name, **kwargs)                              │
│    182 │   │   return model                                                                      │
│    183                                                                                           │
│                                                                                                  │
│ python3.10/site-packages/peft/peft_model.py:662 in __init__                                      │
│                                                                                                  │
│    659 │   """                                                                                   │
│    660 │                                                                                         │
│    661 │   def __init__(self, model, peft_config: PeftConfig, adapter_name="default"):           │
│ ❱  662 │   │   super().__init__(model, peft_config, adapter_name)                                │
│    663 │   │   self.base_model_prepare_inputs_for_generation = self.base_model.prepare_inputs_f  │
│    664 │                                                                                         │
│    665 │   def forward(                                                                          │
│                                                                                                  │
│ python3.10/site-packages/peft/peft_model.py:99 in __init__                                       │
│                                                                                                  │
│     96 │   │   self.base_model_torch_dtype = getattr(model, "dtype", None)                       │
│     97 │   │   if not isinstance(peft_config, PromptLearningConfig):                             │
│     98 │   │   │   self.peft_config[adapter_name] = peft_config                                  │
│ ❱   99 │   │   │   self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type](          │
│    100 │   │   │   │   self.base_model, self.peft_config, adapter_name                           │
│    101 │   │   │   )                                                                             │
│    102 │   │   │   self.set_additional_trainable_modules(peft_config, adapter_name)              │
│                                                                                                  │
│ python3.10/site-packages/peft/tuners/lora.py:132 in __init__                                     │
│                                                                                                  │
│   129 │   │   self.model = model                                                                 │
│   130 │   │   self.forward = self.model.forward                                                  │
│   131 │   │   self.peft_config = config                                                          │
│ ❱ 132 │   │   self.add_adapter(adapter_name, self.peft_config[adapter_name])                     │
│   133 │                                                                                          │
│   134 │   def add_adapter(self, adapter_name, config=None):                                      │
│   135 │   │   if config is not None:                                                             │
│                                                                                                  │
│ python3.10/site-packages/peft/tuners/lora.py:139 in add_adapter                                  │
│                                                                                                  │
│   136 │   │   │   model_config = self.model.config.to_dict() if hasattr(self.model.config, "to   │
│   137 │   │   │   config = self._prepare_lora_config(config, model_config)                       │
│   138 │   │   │   self.peft_config[adapter_name] = config                                        │
│ ❱ 139 │   │   self._find_and_replace(adapter_name)                                               │
│   140 │   │   if len(self.peft_config) > 1 and self.peft_config[adapter_name].bias != "none":    │
│   141 │   │   │   raise ValueError(                                                              │
│   142 │   │   │   │   "LoraModel supports only 1 adapter with bias. When using multiple adapte   │
│                                                                                                  │
│ python3.10/site-packages/peft/tuners/lora.py:217 in                                              │
│ _find_and_replace                                                                                │
│                                                                                                  │
│   214 │   │   │   │   │   │   │   │   )                                                          │
│   215 │   │   │   │   │   │   │   │   kwargs["fan_in_fan_out"] = lora_config.fan_in_fan_out =    │
│   216 │   │   │   │   │   │   else:                                                              │
│ ❱ 217 │   │   │   │   │   │   │   raise ValueError(                                              │
│   218 │   │   │   │   │   │   │   │   f"Target module {target} is not supported. "               │
│   219 │   │   │   │   │   │   │   │   f"Currently, only `torch.nn.Linear` and `Conv1D` are sup   │
│   220 │   │   │   │   │   │   │   )                                                              │

johnsmith0031 commented 1 year ago

Did you applied the monkey patch to peft?

jordankzf commented 1 year ago

@johnsmith0031 Sorry, can you please elaborate? I've been stuck trying to finetune a GPTQ model for days.

I'm running finetune.py directly, where would the monkey patch be applied?

python finetune.py ./output.txt \
    --ds_type=txt \
    --lora_out_dir=./test/ \
    --llama_q4_config_dir=./TheBloke_Stable-Platypus2-13B-GPTQ/config.json \
    --llama_q4_model=./TheBloke_Stable-Platypus2-13B-GPTQ/model.safetensors \
    --mbatch_size=1 \
    --batch_size=1 \
    --epochs=3 \
    --lr=3e-4 \
    --cutoff_len=256 \
    --lora_r=8 \
    --lora_alpha=16 \
    --lora_dropout=0.05 \
    --warmup_steps=5 \
    --save_steps=50 \
    --save_total_limit=3 \
    --logging_steps=5 \
    --groupsize=128 \
    --xformers \
    --backend=cuda

johnsmith0031 commented 1 year ago

It's in the finetune.py file.

from alpaca_lora_4bit.monkeypatch.peft_tuners_lora_monkey_patch import replace_peft_model_with_int4_lora_model
replace_peft_model_with_int4_lora_model()

jordankzf commented 1 year ago

The two lines you quoted are intact. Is it possible that the monkeypatch failed to be applied?

jordankzf commented 1 year ago

Solution found at #148 (See this comment)

Commit for reference

johnsmith0031 / alpaca_lora_4bit

Target module Autograd4bitQuantLinear() is not supported #96