Error attempting to finetune llama2-70b

tensiondriven commented 1 year ago

Hola! I attempted to do a fine-tuning run against Llama-2-70B-GPTQ (branch with file: gptq_model-4bit--1g.safetensors) this morning, and ran into the following error.

Is this perhaps a configuration issue on my end, or does the repo need to be updated to support llama2?

Also, any feedback about how my parameters are configured would be welcome. I'm still have a heckova time getting concrete information about how some of the parameters are applied (micro_batch_size vs batch_size, lora_r, lora_alpha, lora_dropout).

~~Update: I looked through the available branches and couldn't find any containing a file named "gptq_model-4bit--1g.safetensors"; I wonder if the quant that i downloaded has been replaced.~~

The file gptq_model-4bit--1g.safetensors is the file from the main/master branch.

Any advice for which quant(s) to use for fine-tuning? Is there some better/worse compatibility with one vs another? Also, do some quants work better when a lora is applied than others? Any info would be quite useful.

Command line:

output_dir="/home/j/training/output"
dataset_dir="/home/j/training/dataset"
dataset_file="$output_dir/dataset.txt"
log_file="$output_dir/log.txt"
base_model="~/models/llama-70b-base/gptq_model-4bit--1g.safetensors"
base_model_dir=$(dirname "$base_model")

echo "Start training."
python /home/j/ml/training/alpaca_lora_4bit/finetune.py \
  $dataset_file \
  --ds_type txt \
  --lora_out_dir "$output_dir" \
  --llama_q4_config_dir $base_model_dir \
  --llama_q4_model $base_model \
  --verbose \
  --xformers \
  --lora_r 128 \
  --lora_alpha 256 \
  --val_set_size 0.1 \
  --mbatch_size 2 \
  --batch_size 4 \
  --lr 0.0002 \
  --epochs 3 \
  --cutoff_len 512 \
  --warmup_steps 200 \
  --save_steps 2000 \
  --save_total_limit 6 \
  --grad_chckpt

Output:

Start training.
Replaced attention with xformers_attention
Using CUDA implementation.

Parameters:
-------config-------
dataset='/home/j/training/dataset.txt'
ds_type='txt'
lora_out_dir='/home/j/output'
lora_apply_dir=None
llama_q4_config_dir='/mnt/ml-metamind/ml/model-repos/llama2-70b-base'
llama_q4_model='/mnt/ml-metamind/ml/model-repos/llama2-70b-base/gptq_model-4bit--1g.safetensors'

------training------
mbatch_size=2
batch_size=4
gradient_accumulation_steps=2
epochs=3
lr=0.0002
cutoff_len=512
lora_r=128
lora_alpha=256
lora_dropout=0
val_set_size=0.1
gradient_checkpointing=True
gradient_checkpointing_ratio=1
warmup_steps=200
save_steps=2000
save_total_limit=6
logging_steps=10
checkpoint=False
skip=False
world_size=1
ddp=False
device_map='auto'
groupsize=-1
v1=False
backend='cuda'

Disable Dropout.
Loading Model ...
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
The safetensors archive passed at /mnt/ml-metamind/ml/model-repos/llama2-70b-base/gptq_model-4bit--1g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
Loaded the model in 34.32 seconds.                                                                         
Fitting 4bit scales and zeros to half
Train Data: 0.00% outliers                                       
Applying gradient checkpointing ...
Forward Patch Applied For Block 0
Forward Patch Applied For Block 1
Forward Patch Applied For Block 2
...
Forward Patch Applied For Block 77
Forward Patch Applied For Block 78
Forward Patch Applied For Block 79
Var Wrapper Patch Applied
The following columns in the training set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: input. If input are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 9,942
  Num Epochs = 3
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 4
  Gradient Accumulation steps = 2
  Total optimization steps = 7,455
  Number of trainable parameters = 335,544,320
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
  0%|          | 0/7455 [00:00<?, ?it/s]wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: 🚀 View run vivid-feather-20 at: https://wandb.ai/tensiondriven/alpaca_lora_4bit/runs/jashc05u
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230720_071040-jashc05u/logs
Traceback (most recent call last):
  File "/home/j/ml/training/alpaca_lora_4bit/finetune.py", line 186, in <module>
    trainer.train()
  File "/home/j/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train
    return inner_training_loop(
  File "/home/j/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1802, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/j/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2647, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/j/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2672, in compute_loss
    outputs = model(**inputs)
  File "/home/j/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 553, in forward
    return model_forward(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 541, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/j/.local/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/peft/peft_model.py", line 663, in forward
    return self.base_model(
  File "/home/j/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 691, in forward
    outputs = self.model(
  File "/home/j/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 579, in forward
    layer_outputs = decoder_layer(
  File "/home/j/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/ml-metamind/ml/training/alpaca_lora_4bit/gradient_checkpointing.py", line 24, in new_forward
    output = checkpoint(func, *args)
  File "/home/j/.local/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/home/j/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/j/.local/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/mnt/ml-metamind/ml/training/alpaca_lora_4bit/gradient_checkpointing.py", line 21, in func
    return self.layer.old_forward_for_cp(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 294, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/j/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/mnt/ml-metamind/ml/training/alpaca_lora_4bit/monkeypatch/llama_attn_hijack_xformers.py", line 34, in xformers_forward
    key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[2, 30, 64, 128]' is invalid for input of size 61440
Traceback (most recent call last):
  File "/home/j/ml/training/alpaca_lora_4bit/finetune.py", line 186, in <module>
    trainer.train()
  File "/home/j/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train
    return inner_training_loop(
  File "/home/j/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1802, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/j/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2647, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/j/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2672, in compute_loss
    outputs = model(**inputs)
  File "/home/j/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 553, in forward
    return model_forward(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 541, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/j/.local/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/peft/peft_model.py", line 663, in forward
    return self.base_model(
  File "/home/j/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 691, in forward
    outputs = self.model(
  File "/home/j/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 579, in forward
    layer_outputs = decoder_layer(
  File "/home/j/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/ml-metamind/ml/training/alpaca_lora_4bit/gradient_checkpointing.py", line 24, in new_forward
    output = checkpoint(func, *args)
  File "/home/j/.local/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/home/j/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/j/.local/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/mnt/ml-metamind/ml/training/alpaca_lora_4bit/gradient_checkpointing.py", line 21, in func
    return self.layer.old_forward_for_cp(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 294, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/j/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/j/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/mnt/ml-metamind/ml/training/alpaca_lora_4bit/monkeypatch/llama_attn_hijack_xformers.py", line 34, in xformers_forward
    key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[2, 30, 64, 128]' is invalid for input of size 61440

johnsmith0031 commented 1 year ago

It seems that the original monkey patch of xformers does not support GQA. Updated the monkey patch but not sure if it works.

laoda513 commented 1 year ago

I can train and save the lora file with fintune.py with the latest commit ver. But when I loaded the saved lora file and do inference, the output is broken within alpaca_lora_4bit and throw error within exllama

tensiondriven commented 1 year ago

@laoda513 does the lora work when applied using text-generation-webui?

laoda513 commented 1 year ago

@tensiondriven emmmm, the thing is quite strange... I found it's not because the lora cause the output broken. The inference is just broken itself without loading the lora in fintune.py (I commented the training code)

but the training seems still works...I'm testing load the lora and model with exllama to see if every is fine。。。

I'm currently only using script instead of text-generation-webui to keep things simple

johnwick123f commented 11 months ago

I believe the problem might be that llama 2 70b has gqa in its model while any of the other llama models don’t. So that might have been messing something up

johnsmith0031 / alpaca_lora_4bit

Error attempting to finetune llama2-70b #139