meta-llama / llama-recipes

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
10.41k stars 1.48k forks source link

Finetune CodeLlama-7b-Instruct-hf on private dataset #155

Open HumzaSami00 opened 10 months ago

HumzaSami00 commented 10 months ago

I hope this message finds you well. I recently had the opportunity to experiment with the Codellama-7b-Instruct model from GitHub repository and was pleased to observe its promising performance. Encouraged by these initial results, I am interested in fine-tuning this model on my proprietary code chat dataset. I have single 3090 with 24GB VRAM.

To provide you with more context, my dataset has the following structure:

1. <s>[INST] {{user}} [/INST] {{assistant}} </s><s>[INST] {{user}} [/INST] {{assistant}} </s>
2. <s>[INST] {{user}} [/INST] {{assistant}} </s><s>[INST] {{user}} [/INST] {{assistant}} </s>

I have a total of 1000 such chat examples in my dataset.

Could you kindly guide me through the recommended pipeline or steps to effectively fine-tune the Codellama-7b-Instruct model on my specific chat dataset? I look forward to your guidance.

EDIT

I follow this pipeline but its giving me following error:

from transformers import AutoModelForCausalLM,AutoTokenizer
from transformers import LlamaForCausalLM, LlamaTokenizer
import transformers
import torch
from pathlib import Path
import os
import sys

MODEL_NAME = "codellama/CodeLlama-7b-Instruct-hf"

model =LlamaForCausalLM.from_pretrained(MODEL, load_in_8bit=True, device_map='auto', torch_dtype=torch.bfloat16)
tokenizer = LlamaTokenizer.from_pretrained("codellama/CodeLlama-7b-Instruct-hf")

model.train()

def create_peft_config(model):
    from peft import (
        get_peft_model,
        LoraConfig,
        TaskType,
        prepare_model_for_int8_training,
    )

    peft_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        inference_mode=False,
        r=8,
        lora_alpha=32,
        lora_dropout=0.05,
        target_modules = ["q_proj", "v_proj"]
    )

    # prepare int-8 model for training
    model = prepare_model_for_int8_training(model)
    model = get_peft_model(model, peft_config)
    model.print_trainable_parameters()
    return model, peft_config

# create peft config
model, lora_config = create_peft_config(model)

from transformers import TrainerCallback
from contextlib import nullcontext
enable_profiler = False
output_dir = "result"

config = {
    'lora_config': lora_config,
    'learning_rate': 1e-4,
    'num_train_epochs': 1,
    'gradient_accumulation_steps': 2,
    'per_device_train_batch_size': 10,
    'gradient_checkpointing': False,
}

# Set up profiler
if enable_profiler:
    wait, warmup, active, repeat = 1, 1, 2, 1
    total_steps = (wait + warmup + active) * (1 + repeat)
    schedule =  torch.profiler.schedule(wait=wait, warmup=warmup, active=active, repeat=repeat)
    profiler = torch.profiler.profile(
        schedule=schedule,
        on_trace_ready=torch.profiler.tensorboard_trace_handler(f"{output_dir}/logs/tensorboard"),
        record_shapes=True,
        profile_memory=True,
        with_stack=True)

    class ProfilerCallback(TrainerCallback):
        def __init__(self, profiler):
            self.profiler = profiler

        def on_step_end(self, *args, **kwargs):
            self.profiler.step()

    profiler_callback = ProfilerCallback(profiler)
else:
    profiler = nullcontext()

from transformers import default_data_collator, Trainer, TrainingArguments

# Define training args
training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=True,
    bf16=True,  # Use BF16 if available
    # logging strategies
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=10,
    save_strategy="no",
    optim="adamw_torch_fused",
    max_steps=total_steps if enable_profiler else -1,
    **{k:v for k,v in config.items() if k != 'lora_config'}
)

with profiler:
    # Create Trainer instance
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=X_train,
        data_collator=default_data_collator,
        callbacks=[profiler_callback] if enable_profiler else [],
    )

    # Start training
    trainer.train()

ERROR

2680     return loss_mb.reduce_mean().detach().to(self.args.device)
   2682 with self.compute_loss_context_manager():
-> 2683     loss = self.compute_loss(model, inputs)
   2685 if self.args.n_gpu > 1:
   2686     loss = loss.mean()  # mean() to average on multi-gpu parallel training

ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,attention_mask.
HamidShojanazeri commented 10 months ago

@HumzaSami00 I suggest give it a try to the single gpu training as well and to get better quality from qlora paper it seems adapting all linear layers instead of just linear layers of the attention would help to get better performance.

python llama_finetuning.py  --use_peft --peft_method lora --quantization --model_name /patht_of_model_folder/7B --output_dir Path/to/save/PEFT/model

Also to use HF trainer make sure to set AutoTokenizer in the code above as codellama is not using llamatokenizer, also make sure to have HF from src pip install git+https://github.com/huggingface/transformers.

HumzaSami00 commented 10 months ago

@HamidShojanazeri , Thanks for your response. I have few questions.

  1. How can I use my custom dataset (dataset as huggingface object) in llama_finetuning.py ? There is no argument for custom dataset in this script. Do I have to edit the script manually ?
  2. As you mentioned codellama is not using llamatokenizer,but in the llama_finetuning.py LlamaTokenizer is used as tokenizer.
  3. How can I edit peft configuration in llama_finetuning to finetune all linear layers ?

Edit: I tried following command and got this error. According to llama_finetune.py, it seems it accept huggingface model. But according to the readme, We can also pass path to the downloaded model.

input:

python llama-recipes/llama_finetuning.py  --use_peft --peft_method lora --quantization --model_name ./CodeLlama-7b/ --output_dir result

Output:

OSError: ./CodeLlama-7b-Instruct/ does not appear to have a file named config.json. Checkout 'https://huggingface.co/./CodeLlama-7b-Instruct//main' for available files.

My downloaded Model has these files in the folder:

I am not using HF model. I downloaded model from github download.sh file.

wukaixingxp commented 1 month ago

Hi! Please read this document on how to fine-tune llama using custom data. Let me know if you have more questions!