ValueError: Target module Autograd4bitQuantLinear() is not supported.

jordankzf commented 12 months ago

Currently, only the following modules are supported: torch.nn.Linear, torch.nn.Embedding, torch.nn.Conv2d, transformers.pytorch_utils.Conv1D.

I'm using text-generation-webui, but the same issue occurs even when using alpaca_lora_4bit directly. I have applied the monkey-patch (I think?):

2023-08-31 12:04:24 INFO:Loading TheBloke_Llama-2-13B-GPTQ...
2023-08-31 12:04:24 WARNING:Applying the monkey patch for using LoRAs with GPTQ models. It may cause undefined behavior outside its intended scope.

Loaded the model using GPTQ-for-LLaMa.

johnsmith0031 commented 12 months ago

Currently the support in this repo for text-generation-webui is depreciated. If you're using llama-2 for inferencing, It's recommended to use exllama because it is much faster.

And the monkey patch is here:

from alpaca_lora_4bit.monkeypatch.peft_tuners_lora_monkey_patch import replace_peft_model_with_int4_lora_model
replace_peft_model_with_int4_lora_model()

jordankzf commented 12 months ago

I'm trying to finetune. I used to be able to do so. Is there any way to restore this functionality?

johnsmith0031 commented 12 months ago

Maybe it's something wrong with new version of peft? What version of peft do you use? It seems like the issue of peft. I've updated the monkeypatch to support new peft.

jordankzf commented 12 months ago

I've tried many variants of peft. Thanks for updating your repo.

I've pulled your latest changes 2ef233cc, and am using peft-0.5.0 now.

Unfortunately, I'm still hitting the same error.

$ python finetune.py ./output.txt     --ds_type=txt     --lora_out_dir=./test/     --llama_q4_config_dir=./TheBloke_Stable-Platypus2-13B-GPTQ/     --llama_q4_model=./TheBloke_Stable-Platypus2-13B-GPTQ/model.safetensors     --mbatch_size=1     --batch_size=1     --epochs=3     --lr=3e-4     --cutoff_len=256     --lora_r=8     --lora_alpha=16     --lora_dropout=0.05     --warmup_steps=5     --save_steps=50     --save_total_limit=3     --logging_steps=5     --groupsize=128     --xformers     --backend=cuda
Replaced attention with xformers_attention
Using CUDA implementation.

Parameters:
-------config-------
dataset='./output.txt'
ds_type='txt'
lora_out_dir='./test/'
lora_apply_dir=None
llama_q4_config_dir='./TheBloke_Stable-Platypus2-13B-GPTQ/'
llama_q4_model='./TheBloke_Stable-Platypus2-13B-GPTQ/model.safetensors'

------training------
mbatch_size=1
batch_size=1
gradient_accumulation_steps=1
epochs=3
lr=0.0003
cutoff_len=256
lora_r=8
lora_alpha=16
lora_dropout=0.05
val_set_size=0.2
gradient_checkpointing=False
gradient_checkpointing_ratio=1
warmup_steps=5
save_steps=50
save_total_limit=3
logging_steps=5
checkpoint=False
skip=False
world_size=1
ddp=False
device_map='auto'
groupsize=128
v1=False
backend='cuda'

Loading Model ...
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Loaded the model in 4.15 seconds.
Traceback (most recent call last):
  File "/home/jordan/alpaca_lora_4bit/finetune.py", line 89, in <module>
    model = get_peft_model(model, lora_config)
  File "/home/jordan/miniconda3/envs/textgen/lib/python3.10/site-packages/peft/mapping.py", line 106, in get_peft_model
    return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
  File "/home/jordan/miniconda3/envs/textgen/lib/python3.10/site-packages/peft/peft_model.py", line 889, in __init__
    super().__init__(model, peft_config, adapter_name)
  File "/home/jordan/miniconda3/envs/textgen/lib/python3.10/site-packages/peft/peft_model.py", line 111, in __init__
    self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type](
  File "/home/jordan/miniconda3/envs/textgen/lib/python3.10/site-packages/peft/tuners/lora.py", line 274, in __init__
    super().__init__(model, config, adapter_name)
  File "/home/jordan/miniconda3/envs/textgen/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 88, in __init__
    self.inject_adapter(self.model, adapter_name)
  File "/home/jordan/miniconda3/envs/textgen/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 219, in inject_adapter
    self._create_and_replace(peft_config, adapter_name, target, target_name, parent, **optionnal_kwargs)
  File "/home/jordan/miniconda3/envs/textgen/lib/python3.10/site-packages/peft/tuners/lora.py", line 372, in _create_and_replace
    new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs)
  File "/home/jordan/miniconda3/envs/textgen/lib/python3.10/site-packages/peft/tuners/lora.py", line 481, in _create_new_module
    raise ValueError(
ValueError: Target module Autograd4bitQuantLinear() is not supported. Currently, only `torch.nn.Linear` and `Conv1D` are supported.

johnsmith0031 commented 12 months ago

If the latest version is correctly installed, there should be some log like: "Repalced _create_new_module and _replace_module function" You can try uninstall and reinstall alpaca_lora_4bit.

cd alpaca_lora_4bit
pip uninstall alpaca_lora_4bit
pip uninstall alpaca_lora_4bit # uninstall again to ensure that you do not have another version
pip install .

jordankzf commented 12 months ago

Works! Thank you very much @johnsmith0031

On a side note, would you consider adding a "Troubleshooting" section to the README? I would be happy to assist with this.

johnsmith0031 commented 12 months ago

Yes I think I'll add it on install manual

johnsmith0031 / alpaca_lora_4bit

ValueError: Target module Autograd4bitQuantLinear() is not supported. #148