No v_head weight is found

BUILDERlym commented 2 months ago

System Info

WARNING:root:A <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'> model is loaded from './saved_models/fp-meta-llama3', and no v_head weight is found.

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder
[X] My own task or dataset (give details below)

Reproduction

I first Use SFTTrainer to finetune llama-3-8b-Instruct, and when I call

model = AutoModelForCausalLMWithValueHead.from_pretrained(
    "./saved_models/fp-meta-llama3",
    load_in_4bit=False,
    device_map="auto",
    peft_config=None,
    torch_dtype=torch.bfloat16,
)

the above warning appeared.

Expected behavior

Just want to know how to fix this warnign, should I add a v_head layer manually?

lewtun commented 2 months ago

Hello @BUILDERlym, can you share the model (or a dummy version) so we can try to reproduce the error please?

BUILDERlym commented 1 month ago

For SFT:

   model = AutoModelForCausalLM.from_pretrained(
        "meta-llama/Meta-Llama-3-8B-Instruct",
        load_in_4bit=False,
        torch_dtype=torch.bfloat16,
        device_map=device_map,
        trust_remote_code=True,
    )
    tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
    config = LoraConfig(
        r=LORA_R,
        lora_alpha=LORA_ALPHA,
        target_modules=TARGET_MODULES,
        lora_dropout=LORA_DROPOUT,
        bias="none",
        task_type="CAUSAL_LM",
    )
    model = get_peft_model(model, config)

Then use transformers.Trainer() to train and save. Merge peft adapter:

    model = AutoModelForCausalLM.from_pretrained(
        peft_config.base_model_name_or_path,
        return_dict=True,
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        load_in_4bit=False,
    )
    model = PeftModel.from_pretrained(model, peft_model_id)
    model.eval()
    model = model.merge_and_unload()

and save.

BUILDERlym commented 1 month ago

For Reward Model:

   model = AutoModelForSequenceClassification.from_pretrained(
        "meta-llama/Meta-Llama-3-8B-Instruct",
        num_labels=1,
        torch_dtype=torch.bfloat16,
        device_map=device_map,
        trust_remote_code=True,
    )
    peft_config = LoraConfig(
        task_type=TaskType.SEQ_CLS,
        inference_mode=False,
        r=8,
        lora_alpha=16,  # 32,
        lora_dropout=0.05,  # 0.1,
        bias="none",
    )
    model = get_peft_model(model, peft_config)

    class RewardTrainer(Trainer):
        def compute_loss(self, model, inputs, return_outputs=False):
            rewards_j = model(
                input_ids=inputs["input_ids_j"],
                attention_mask=inputs["attention_mask_j"],
            )[0]
            rewards_k = model(
                input_ids=inputs["input_ids_k"],
                attention_mask=inputs["attention_mask_k"],
            )[0]
            loss = -nn.functional.logsigmoid(rewards_j - rewards_k).mean()
            if return_outputs:
                return loss, {"rewards_j": rewards_j, "rewards_k": rewards_k}
            return loss

    trainer = RewardTrainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        compute_metrics=compute_metrics,
        data_collator=RewardDataCollatorWithPadding(
            tokenizer=tokenizer, max_length=512, pad_to_multiple_of=8
        ),
    )

    model.config.use_cache = False

    trainer.train(script_args.resume_from_reward_checkpoint)

same process for merging peft adapter.

BUILDERlym commented 1 month ago

For PPO:

    model = AutoModelForCausalLMWithValueHead.from_pretrained(
        "saved_sft_model",
        load_in_4bit=False,
        device_map="auto",
        peft_config=lora_config,
        torch_dtype=torch.bfloat16,
    )

this process will have the above warning.

qgallouedec commented 1 month ago

MRE

from trl import AutoModelForCausalLMWithValueHead

model = AutoModelForCausalLMWithValueHead.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

Output:

Loading checkpoint shards: 100%|██████████████████████| 4/4 [00:04<00:00,  1.10s/it]
Loading checkpoint shards: 100%|██████████████████████| 4/4 [00:02<00:00,  1.34it/s]
WARNING:root:A <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'> model is loaded from 'meta-llama/Meta-Llama-3-8B-Instruct', and no v_head weight is found. This IS expected if you are not resuming PPO training.

System info

Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31
Python version: 3.11.9
PyTorch version: 2.4.1
CUDA device(s): NVIDIA H100 80GB HBM3
Transformers version: 4.46.0.dev0
Accelerate version: 1.0.0
Accelerate config: not found
Datasets version: 3.0.1
HF Hub version: 0.24.7
TRL version: 0.12.0.dev0+50e8e97
bitsandbytes version: 0.41.1
DeepSpeed version: 0.15.2
Diffusers version: 0.30.3
Liger-Kernel version: 0.3.0
LLM-Blender version: 0.0.2
OpenAI version: 1.46.0
PEFT version: 0.13.2

huggingface / trl