xfactlab / orpo

Official repository for ORPO
Apache License 2.0
421 stars 39 forks source link

Doubt about the formatting of the `prompt`, `chosen` and `rejected` #7

Closed alvarobartt closed 7 months ago

alvarobartt commented 7 months ago

Hi here @jiwooya1000!

After exploring a bit the codebase in order to seek a 1:1 reproduction using the alignment-handbook (see https://github.com/huggingface/alignment-handbook/pull/143) I've seen that you're adding the generation_prompt.

Example

Given the following prompt, chosen and rejected values:

prompt = [{"role": "user", "content": "What's the capital of South Korea?"}]
chosen = [{"role": "assistant", "content": "Seoul"}]
rejected = [{"role": "assistant", "content": "Madrid"}]

After applying the chat template mappings defined within the code at:

https://github.com/xfactlab/orpo/blob/23964e92cf590f02e281a320714c3498dc47a3b8/main.py#L86-L88

The resulting values after apply_chat_template are the following:

prompt = "<|user|>\nWhat's the capital of South Korea?\n<|assistant|>\n"
chosen = "<|user|>\nWhat's the capital of South Korea?\n<|assistant|>\nSeoul"
rejected = "<|user|>\nWhat's the capital of South Korea?\n<|assistant|>\nMadrid"
alvarobartt commented 7 months ago

Oops may need some sleep, I saw now that you're doing it that way to keep the attention_mask including the whole conversation for both the chosen and the rejected messages in the chat, so the prompt is required there. Sorry for the misunderstanding!

AIR-hl commented 5 months ago

Oops may need some sleep, I saw now that you're doing it that way to keep the attention_mask including the whole conversation for both the chosen and the rejected messages in the chat, so the prompt is required there. Sorry for the misunderstanding! @alvarobartt Sorry, I dont understand that, why chosen and rejected responses need include prompt ?