johnsmith0031 / alpaca_lora_4bit

MIT License
533 stars 84 forks source link

LoRA Output Identical to Base Model #137

Closed LegendBegins closed 1 year ago

LegendBegins commented 1 year ago

Hi, I'm doing some finetuning on Wizard Vicuna 13B Quantized, and when I load the model with the resulting LoRA applied and a set seed, the output matches that of the base model. I'm using SAD-formatted data with instructions, outputs, and empty input fields.

Command I use for training:

python finetune.py ../data.txt --ds_type=alpaca --lora_out_dir=./output/ --llama_q4_config_dir=../ --llama_q4_model=../Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors --mbatch_size=1 --batch_size=2 --epochs=5 --lr=3e-4 --cutoff_len=256 --lora_r=8 --lora_alpha=16 --lora_dropout=0.05 --warmup_steps=5 --save_steps=50 --save_total_limit=3 --logging_steps=5 --groupsize=128 --xformers --backend=triton

I've modified inference.py and replaced the loader with load_llama_model_4bit_low_ram_and_offload and lora_path. I've also tested the resulting model in exllama, which also doesn't see any change between the LoRA and base models. I would appreciate any insight!

johnsmith0031 commented 1 year ago

You can compare the logits output from the model with lora applied or not. I think the logits should be different as long as the lora is finetuned correctly.

pauliustumas commented 1 year ago

I have noticed the same issue. Found that if you use --grad_chckpt in finetune and then use checkpoint as lora adapter, the issue is solved.

LegendBegins commented 1 year ago

I have noticed the same issue. Found that if you use --grad_chckpt in finetune and then use checkpoint as lora adapter, the issue is solved.

I tried this solution, but I'm getting the same behavior regardless. I've tried retraining with a number of different argument combinations, but adapter_model.bin is always 26271757 bytes and produces identical output to the base model when loaded as a lora adapter. In case it's useful, here's the output:

stderror:

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
The safetensors archive passed at ../Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
Downloading data files: 100%|██████████| 1/1 [00:00<00:00, 3799.19it/s]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00,  8.73it/s]
100%|██████████| 1/1 [00:00<00:00, 14.45it/s]

stdout:

quant_cuda not found. Please run "pip install alpaca_lora_4bit[cuda]".
Replaced attention with xformers_attention
Using Triton implementation.

Parameters:
-------config-------
dataset='../discord_training_list.txt'
ds_type='alpaca'
lora_out_dir='./output/'
lora_apply_dir=None
llama_q4_config_dir='../'
llama_q4_model='../Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors'

------training------
mbatch_size=8
batch_size=8
gradient_accumulation_steps=1
epochs=5
lr=0.0003
cutoff_len=4096
lora_r=8
lora_alpha=16
lora_dropout=0
val_set_size=0.2
gradient_checkpointing=True
gradient_checkpointing_ratio=1
warmup_steps=5
save_steps=50
save_total_limit=3
logging_steps=5
checkpoint=True
skip=False
world_size=1
ddp=False
device_map='auto'
groupsize=128
v1=False
backend='triton'

Disable Dropout.
Loading Model ...
Loaded the model in 10.83 seconds.
Fitting 4bit scales and zeros to half
Downloading and preparing dataset /fakepath...
Dataset json downloaded and prepared to /fakepath/. Subsequent calls will reuse this data.
Applying gradient checkpointing ...
Forward Patch Applied For Block 0
<snipped>
Forward Patch Applied For Block 39
Var Wrapper Patch Applied
Run eval every 1754 steps
Train completed.
Model Saved.
LegendBegins commented 1 year ago

Update: Today I'm that guy. @pauliustumas' solution worked fine; I was just on the wrong branch. Had to disable wandb for my use case as well. Hopefully this helps someone else who runs into the same issue.