Closed lx0126z closed 1 year ago
Hi @lx0126z Thank you very much for pointing out the issue - could you share with us a reproducible small script for the issue you are facing?
model = AutoModelForCausalLM.from_pretrained(model_path,device_map="auto", max_memory={0: "24GB", 1: "24GB"}, quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) )
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
config = LoraConfig(
r=64,
lora_alpha=64,
target_modules=['q_proj', 'v_proj'],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, config)
model.print_trainable_parameters()
trainer = transformers.Trainer(
model=model,
train_dataset=train_data,
eval_dataset=val_data,
compute_metrics=compute_metrics,
args=transformers.TrainingArguments(
per_device_train_batch_size=2
gradient_accumulation_steps=10
warmup_steps=20,
num_train_epochs=1,
learning_rate= 0.0003,
fp16=True,
logging_steps=10,
optim="paged_adamw_8bit",
evaluation_strategy="steps",
save_strategy="steps",
eval_steps=50,
save_steps=50,
include_inputs_for_metrics=True,
output_dir=opt.output_dir,
save_total_limit=5,
load_best_model_at_end=True,
ddp_find_unused_parameters=None,
group_by_length=False,
report_to='wandb',
run_name='abc',
),
data_collator=transformers.DataCollatorForSeq2Seq(tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True ),
)
model.config.use_cache = False
trainer.train()
this is the code without train_data and val_data, the mistake happened in the last step " trainer.train()",the file 'adapter_model.bin' will be save,but the save file is empty
Hi @lx0126z I tried to reproduce the issue, for me the state dict was not empty, the script I used was the one below:
from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training, PeftModel, prepare_model_for_kbit_training
from transformers import AutoModelForCausalLM
import transformers
import tempfile
from datasets import load_dataset
model_id = "facebook/opt-350m"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_in_4bit=True)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
def build_dataset(model_id, dataset_name="imdb"):
"""
Build dataset for training. This builds the dataset from `load_dataset`, one should
customize this function to train the model on its own dataset.
Args:
dataset_name (`str`):
The name of the dataset to be loaded.
Returns:
dataloader (`torch.utils.data.DataLoader`):
The dataloader for the dataset.
"""
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
# load imdb with datasets
ds = load_dataset(dataset_name, split="train")
ds = ds.rename_columns({"text": "review"})
ds = ds.filter(lambda x: len(x["review"]) > 200, batched=False)
def tokenize(sample):
sample["input_ids"] = tokenizer.encode(sample["review"])
sample["query"] = tokenizer.decode(sample["input_ids"])
return sample
ds = ds.map(tokenize, batched=False)
ds.set_format(type="torch")
ds.remove_columns(['review', 'label', 'query'])
return ds
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
config = LoraConfig(
r=64,
lora_alpha=64,
target_modules=['q_proj', 'v_proj'],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
dataset = build_dataset(model_id)
model = get_peft_model(model, config)
with tempfile.TemporaryDirectory() as tmp_dirname:
trainer = transformers.Trainer(
model=model,
train_dataset=dataset,
args=transformers.TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=10,
warmup_steps=0,
max_steps=3,
learning_rate= 0.0003,
fp16=True,
logging_steps=10,
optim="paged_adamw_8bit",
save_strategy="steps",
save_steps=1,
output_dir=tmp_dirname,
save_total_limit=5,
group_by_length=False,
report_to='wandb',
run_name='abc',
),
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, pad_to_multiple_of=8, return_tensors="pt", mlm=False),
)
model.config.use_cache = False
trainer.train()
I am using transformers from source which contains mutiple recent fixes for Trainer + PEFT saving, can you try the snippet I shared and also try to uninstall transformers
and install it from source ?
pip install git+https://github.com/huggingface/transformers.git
yes, you are right, thanks for you help! I didn't meet the problem again.
Thank you ! Closing the issue for now, feel free to re-open it in case you face other issues
System Info
in this code , i find the adapter_name is 'default', but in the model no named default
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
i use peft to train llama by lora
Expected behavior
i delete the adapter_name , it's right?