lx0126z commented 1 year ago

System Info

in this code , i find the adapter_name is 'default', but in the model no named default

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

i use peft to train llama by lora

Expected behavior

i delete the adapter_name , it's right?

younesbelkada commented 1 year ago

Hi @lx0126z Thank you very much for pointing out the issue - could you share with us a reproducible small script for the issue you are facing?

lx0126z commented 1 year ago

load model

model = AutoModelForCausalLM.from_pretrained(model_path,device_map="auto", max_memory={0: "24GB", 1: "24GB"}, quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) )

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

load lora

config = LoraConfig(
    r=64,
    lora_alpha=64,
    target_modules=['q_proj', 'v_proj'],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
model.print_trainable_parameters()

set trainer

trainer = transformers.Trainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=val_data,
    compute_metrics=compute_metrics,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=2
        gradient_accumulation_steps=10
        warmup_steps=20,
        num_train_epochs=1,
        learning_rate= 0.0003,
        fp16=True,
        logging_steps=10,
        optim="paged_adamw_8bit",
        evaluation_strategy="steps", 
        save_strategy="steps",
        eval_steps=50,
        save_steps=50,
        include_inputs_for_metrics=True,
        output_dir=opt.output_dir,
        save_total_limit=5,
        load_best_model_at_end=True, 
        ddp_find_unused_parameters=None, 
        group_by_length=False,  
        report_to='wandb',  
        run_name='abc',  
    ),
    data_collator=transformers.DataCollatorForSeq2Seq(tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True ), 
)
model.config.use_cache = False  
trainer.train()

this is the code without train_data and val_data, the mistake happened in the last step " trainer.train()",the file 'adapter_model.bin' will be save,but the save file is empty

younesbelkada commented 1 year ago

Hi @lx0126z I tried to reproduce the issue, for me the state dict was not empty, the script I used was the one below:

from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training, PeftModel, prepare_model_for_kbit_training
from transformers import AutoModelForCausalLM
import transformers
import tempfile
from datasets import load_dataset

model_id = "facebook/opt-350m"

model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_in_4bit=True)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)

def build_dataset(model_id, dataset_name="imdb"):
    """
    Build dataset for training. This builds the dataset from `load_dataset`, one should
    customize this function to train the model on its own dataset.

    Args:
        dataset_name (`str`):
            The name of the dataset to be loaded.

    Returns:
        dataloader (`torch.utils.data.DataLoader`):
            The dataloader for the dataset.
    """
    tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
    tokenizer.pad_token = tokenizer.eos_token
    # load imdb with datasets
    ds = load_dataset(dataset_name, split="train")
    ds = ds.rename_columns({"text": "review"})
    ds = ds.filter(lambda x: len(x["review"]) > 200, batched=False)

    def tokenize(sample):
        sample["input_ids"] = tokenizer.encode(sample["review"])
        sample["query"] = tokenizer.decode(sample["input_ids"])
        return sample

    ds = ds.map(tokenize, batched=False)
    ds.set_format(type="torch")
    ds.remove_columns(['review', 'label', 'query'])

    return ds

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

config = LoraConfig(
    r=64,
    lora_alpha=64,
    target_modules=['q_proj', 'v_proj'],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

dataset = build_dataset(model_id)

model = get_peft_model(model, config)
with tempfile.TemporaryDirectory() as tmp_dirname:
    trainer = transformers.Trainer(
        model=model,
        train_dataset=dataset,
        args=transformers.TrainingArguments(
            per_device_train_batch_size=2,
            gradient_accumulation_steps=10,
            warmup_steps=0,
            max_steps=3,
            learning_rate= 0.0003,
            fp16=True,
            logging_steps=10,
            optim="paged_adamw_8bit",
            save_strategy="steps",
            save_steps=1,
            output_dir=tmp_dirname,
            save_total_limit=5,
            group_by_length=False,  
            report_to='wandb',  
            run_name='abc',  
        ),
        data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, pad_to_multiple_of=8, return_tensors="pt", mlm=False), 
    )
    model.config.use_cache = False  
    trainer.train()

I am using transformers from source which contains mutiple recent fixes for Trainer + PEFT saving, can you try the snippet I shared and also try to uninstall transformers and install it from source ?

pip install git+https://github.com/huggingface/transformers.git

lx0126z commented 1 year ago

yes, you are right, thanks for you help! I didn't meet the problem again.

younesbelkada commented 1 year ago

Thank you ! Closing the issue for now, feel free to re-open it in case you face other issues

huggingface / peft

save is a empty dict #580

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

load model

load lora

set trainer