huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
10.14k stars 1.28k forks source link

No v_head weight is found #2095

Open BUILDERlym opened 2 months ago

BUILDERlym commented 2 months ago

System Info

WARNING:root:A <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'> model is loaded from './saved_models/fp-meta-llama3', and no v_head weight is found.

Information

Tasks

Reproduction

  1. I first Use SFTTrainer to finetune llama-3-8b-Instruct, and when I call
    model = AutoModelForCausalLMWithValueHead.from_pretrained(
        "./saved_models/fp-meta-llama3",
        load_in_4bit=False,
        device_map="auto",
        peft_config=None,
        torch_dtype=torch.bfloat16,
    )

    the above warning appeared.

Expected behavior

Just want to know how to fix this warnign, should I add a v_head layer manually?

lewtun commented 2 months ago

Hello @BUILDERlym, can you share the model (or a dummy version) so we can try to reproduce the error please?

BUILDERlym commented 1 month ago

For SFT:

   model = AutoModelForCausalLM.from_pretrained(
        "meta-llama/Meta-Llama-3-8B-Instruct",
        load_in_4bit=False,
        torch_dtype=torch.bfloat16,
        device_map=device_map,
        trust_remote_code=True,
    )
    tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
    config = LoraConfig(
        r=LORA_R,
        lora_alpha=LORA_ALPHA,
        target_modules=TARGET_MODULES,
        lora_dropout=LORA_DROPOUT,
        bias="none",
        task_type="CAUSAL_LM",
    )
    model = get_peft_model(model, config)

Then use transformers.Trainer() to train and save. Merge peft adapter:

    model = AutoModelForCausalLM.from_pretrained(
        peft_config.base_model_name_or_path,
        return_dict=True,
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        load_in_4bit=False,
    )
    model = PeftModel.from_pretrained(model, peft_model_id)
    model.eval()
    model = model.merge_and_unload()

and save.

BUILDERlym commented 1 month ago

For Reward Model:

   model = AutoModelForSequenceClassification.from_pretrained(
        "meta-llama/Meta-Llama-3-8B-Instruct",
        num_labels=1,
        torch_dtype=torch.bfloat16,
        device_map=device_map,
        trust_remote_code=True,
    )
    peft_config = LoraConfig(
        task_type=TaskType.SEQ_CLS,
        inference_mode=False,
        r=8,
        lora_alpha=16,  # 32,
        lora_dropout=0.05,  # 0.1,
        bias="none",
    )
    model = get_peft_model(model, peft_config)

    class RewardTrainer(Trainer):
        def compute_loss(self, model, inputs, return_outputs=False):
            rewards_j = model(
                input_ids=inputs["input_ids_j"],
                attention_mask=inputs["attention_mask_j"],
            )[0]
            rewards_k = model(
                input_ids=inputs["input_ids_k"],
                attention_mask=inputs["attention_mask_k"],
            )[0]
            loss = -nn.functional.logsigmoid(rewards_j - rewards_k).mean()
            if return_outputs:
                return loss, {"rewards_j": rewards_j, "rewards_k": rewards_k}
            return loss

    trainer = RewardTrainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        compute_metrics=compute_metrics,
        data_collator=RewardDataCollatorWithPadding(
            tokenizer=tokenizer, max_length=512, pad_to_multiple_of=8
        ),
    )

    model.config.use_cache = False

    trainer.train(script_args.resume_from_reward_checkpoint)

same process for merging peft adapter.

BUILDERlym commented 1 month ago

For PPO:

    model = AutoModelForCausalLMWithValueHead.from_pretrained(
        "saved_sft_model",
        load_in_4bit=False,
        device_map="auto",
        peft_config=lora_config,
        torch_dtype=torch.bfloat16,
    )

this process will have the above warning.

qgallouedec commented 1 month ago

MRE

from trl import AutoModelForCausalLMWithValueHead

model = AutoModelForCausalLMWithValueHead.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

Output:

Loading checkpoint shards: 100%|██████████████████████| 4/4 [00:04<00:00,  1.10s/it]
Loading checkpoint shards: 100%|██████████████████████| 4/4 [00:02<00:00,  1.34it/s]
WARNING:root:A <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'> model is loaded from 'meta-llama/Meta-Llama-3-8B-Instruct', and no v_head weight is found. This IS expected if you are not resuming PPO training.

System info