Open BUILDERlym opened 2 months ago
Hello @BUILDERlym, can you share the model (or a dummy version) so we can try to reproduce the error please?
For SFT:
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3-8B-Instruct",
load_in_4bit=False,
torch_dtype=torch.bfloat16,
device_map=device_map,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
config = LoraConfig(
r=LORA_R,
lora_alpha=LORA_ALPHA,
target_modules=TARGET_MODULES,
lora_dropout=LORA_DROPOUT,
bias="none",
task_type="CAUSAL_LM",
)
model = get_peft_model(model, config)
Then use transformers.Trainer() to train and save. Merge peft adapter:
model = AutoModelForCausalLM.from_pretrained(
peft_config.base_model_name_or_path,
return_dict=True,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
load_in_4bit=False,
)
model = PeftModel.from_pretrained(model, peft_model_id)
model.eval()
model = model.merge_and_unload()
and save.
For Reward Model:
model = AutoModelForSequenceClassification.from_pretrained(
"meta-llama/Meta-Llama-3-8B-Instruct",
num_labels=1,
torch_dtype=torch.bfloat16,
device_map=device_map,
trust_remote_code=True,
)
peft_config = LoraConfig(
task_type=TaskType.SEQ_CLS,
inference_mode=False,
r=8,
lora_alpha=16, # 32,
lora_dropout=0.05, # 0.1,
bias="none",
)
model = get_peft_model(model, peft_config)
class RewardTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False):
rewards_j = model(
input_ids=inputs["input_ids_j"],
attention_mask=inputs["attention_mask_j"],
)[0]
rewards_k = model(
input_ids=inputs["input_ids_k"],
attention_mask=inputs["attention_mask_k"],
)[0]
loss = -nn.functional.logsigmoid(rewards_j - rewards_k).mean()
if return_outputs:
return loss, {"rewards_j": rewards_j, "rewards_k": rewards_k}
return loss
trainer = RewardTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics,
data_collator=RewardDataCollatorWithPadding(
tokenizer=tokenizer, max_length=512, pad_to_multiple_of=8
),
)
model.config.use_cache = False
trainer.train(script_args.resume_from_reward_checkpoint)
same process for merging peft adapter.
For PPO:
model = AutoModelForCausalLMWithValueHead.from_pretrained(
"saved_sft_model",
load_in_4bit=False,
device_map="auto",
peft_config=lora_config,
torch_dtype=torch.bfloat16,
)
this process will have the above warning.
MRE
from trl import AutoModelForCausalLMWithValueHead
model = AutoModelForCausalLMWithValueHead.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
Output:
Loading checkpoint shards: 100%|██████████████████████| 4/4 [00:04<00:00, 1.10s/it]
Loading checkpoint shards: 100%|██████████████████████| 4/4 [00:02<00:00, 1.34it/s]
WARNING:root:A <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'> model is loaded from 'meta-llama/Meta-Llama-3-8B-Instruct', and no v_head weight is found. This IS expected if you are not resuming PPO training.
System info
System Info
WARNING:root:A <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'> model is loaded from './saved_models/fp-meta-llama3', and no v_head weight is found.
Information
Tasks
examples
folderReproduction
SFTTrainer
to finetune llama-3-8b-Instruct, and when I callthe above warning appeared.
Expected behavior
Just want to know how to fix this warnign, should I add a v_head layer manually?