llava-rlhf / LLaVA-RLHF

Aligning LMMs with Factually Augmented RLHF
https://llava-rlhf.github.io/
GNU General Public License v3.0
323 stars 25 forks source link

Quetion about reward model's score #35

Closed DripNowhy closed 2 months ago

DripNowhy commented 3 months ago

Hi, when I directly apply your reward model to the published preference dataset, the scores seem strange and are always <0. image I recorded the score of chosen response and rejected response, and found that the chosen score is not higher than rejected response in most data. image The way I use the reward model is

# Loading RM
        self.model_sft = LlavaLlamaForCausalLM.from_pretrained(
            sft_dir,
            device_map={"": "cuda:0"},
            torch_dtype=torch.bfloat16,
        )

        self.RM = PeftModel.from_pretrained(
            self.model_sft,
            rm_lora_dir,
        )
        self.reward_head = nn.Linear(5120,1).to(self.RM.device)
        self.reward_head.load_state_dict(
        torch.load(
            os.path.join(rm_lora_dir, "reward_head"),
            map_location = "cpu"
            )
        )
# prepare rm inputs
def preprocess_reward_model(
    source,
    tokenizer,
):
    conv = conversation_lib.default_conversation.copy()
    conv.append_message(conv.roles[0], source["question"])
    conv.append_message(conv.roles[1], source["answer"])
    conversations = []
    conversations.append(conv.get_prompt())
    input_ids = torch.stack(
            [
                tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt") 
                for prompt in conversations
            ],
            dim=0,
        ).cuda()
    return input_ids[:, :-1]
# get rm score
def _get_reward(input_ids, image_tensor):
    out = cs.RM(
                    input_ids=input_ids,
                    images = image_tensor.to(dtype=torch.float16).cuda(),
                    return_dict = True,
                    output_hidden_states=True,
                )
    last_hidden_states = out.hidden_states[-1].type_as(
                cs.reward_head.weight
            )
    reward = cs.reward_head(last_hidden_states[:, -1])[0]
    return reward

Thank you!

DripNowhy commented 3 months ago

I realized that I didn't add the reward_prompt after the input. When I used a standard reward_prompt, I found that approximately 65% of the chosen responses had a higher reward score than the rejected responses. Is this result acceptable? image The way I add reward_prompt

def preprocess_reward_model(
    source,
    tokenizer,
):
    conv = conversation_lib.default_conversation.copy()
    conv.append_message(conv.roles[0], source["question"])
    conv.append_message(conv.roles[1], source["answer"])
    conversations = []
    conversations.append(
                conv.get_prompt() + source['reward_prompt'] + "</s>"
            )
    input_ids = torch.stack(
            [
                tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt") 
                for prompt in conversations
            ],
            dim=0,
        ).cuda()

    return input_ids[:, :-1]
Edward-Sun commented 2 months ago

Yeah 65% RM accuracy is pretty decent 👍

tunantu commented 4 weeks ago

I realized that I didn't add the reward_prompt after the input. When I used a standard reward_prompt, I found that approximately 65% of the chosen responses had a higher reward score than the rejected responses. Is this result acceptable? image The way I add reward_prompt

def preprocess_reward_model(
    source,
    tokenizer,
):
    conv = conversation_lib.default_conversation.copy()
    conv.append_message(conv.roles[0], source["question"])
    conv.append_message(conv.roles[1], source["answer"])
    conversations = []
    conversations.append(
                conv.get_prompt() + source['reward_prompt'] + "</s>"
            )
    input_ids = torch.stack(
            [
                tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt") 
                for prompt in conversations
            ],
            dim=0,
        ).cuda()

    return input_ids[:, :-1]

Hey, May I know the reward prompt you used here? I also face the same question.

DripNowhy commented 3 weeks ago
conversations.append(
                conv.get_prompt() + source['reward_prompt'] + "</s>"
            )

replace the source['reward_prompt'] with the prompt below, it seems to work for me

USER: Please evaluate the quality of your last response. There are several dimensions you should consider in your evaluation:

1. Accurate: The AI should provide factual and accurate information from the image, and refrain from making statements that are not supported by the image or inconsistent with the image.
2. Helpful: The AI’s response should precisely serve the user's needs and interests, while grounding the response in the image.
3. Language Natural: The AI should employ language that flows smoothly and is free from repetitive or awkward constructs.
4. Concise: The AI should efficiently address the task or answer the question, communicating the necessary information with brevity and clarity.

A good response should be accurate, helpful, language natural, and concise. ASSISTANT: Following your definitions, the quality score of my last response is