princeton-nlp / SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward
MIT License
613 stars 35 forks source link

the outputs of reproduced model has "<|start_header_id|>assistant<|end_header_id|>" at beginning #17

Closed binzhwang closed 4 weeks ago

binzhwang commented 2 months ago

why the output of my reproduced model in the experiment of llama-3-8B-Instruct-SimPO has "<|start_header_id|>assistant<|end_header_id|>" at beginning.

Example output on alpaca_eval: { "dataset":"helpful_base", "instruction":"I am going to make pumpkin pie for the first time. Can you help me?", "output":"<|start_header_id|>assistant<|end_header_id|>>Pumpkin pie making for the first time! I'd be delighted to guide you through the process.\n\nMaking a classic pumpkin pie from scratch can be a fun and rewarding experience. Here's a simple recipe and some tips to ensure your pie turns out delicious:\n\nIngredients:\n\nFor the crust:\n\n 2 1\/4 cups all-purpose flour\n 1 tsp salt\n 1\/2 cup cold unsalted butter, cut into small cubes\n 1\/4 cup ice water\n\nFor the filling:\n\n 1 cup cooked, mashed pumpkin (canned or fresh, cooked and mashed)\n 1 1\/2 cups heavy cream\n 1\/2 cup granulated sugar\n 1\/2 tsp salt\n 1\/2 tsp ground cinnamon\n 1\/4 tsp ground nutmeg\n 1\/4 tsp ground ginger\n 2 large eggs\n\nInstructions:\n\n1. Make the crust: In a large bowl, combine flour and salt. Add the cubed butter and use a pastry blender or your fingers to work it into the flour until the mixture resembles coarse crumbs. Gradually add ice water, stirring with a fork until the dough comes together in a ball. Wrap and refrigerate for at least 30 minutes.\n2. Preheat and roll out the crust: Preheat your oven to 425\u00b0F (220\u00b0C). On a lightly floured surface, roll out the chilled dough to a thickness of about 1\/8 inch. Transfer the dough to a 9-inch pie dish and trim the edges to fit.\n3. Prepare the filling: In a separate bowl, whisk together pumpkin, heavy cream, sugar, salt, cinnamon, nutmeg, and ginger until well combined. Beat in the eggs until smooth.\n4. Fill and bake the pie: Pour the pumpkin mixture into the pie crust. Bake for 15 minutes, then reduce the oven temperature to 350\u00b0F (180\u00b0C) and continue baking for an additional 30-40 minutes, or until the filling is set and the crust is golden brown.\n5. Cool and serve: Let the pie cool on a wire rack for at least 2 hours before serving. Enjoy!\n\nTips:\n\n Use fresh, high-quality ingredients, including real pumpkin puree and heavy cream.\n Don't overmix the crust dough or filling, as this can lead to a tough pie.\n Blind baking (lining the crust with parchment paper and weights) can help prevent shrinkage, but it's not necessary for a single-crust pie like this recipe.\n If you're unsure about the pie's doneness, check for a jewel-toned orange filling and a firm, set texture.\n\nI hope this helps you make a delicious pumpkin pie for your occasion! If you have any specific questions or concerns, feel free to ask.", "generator":"Llama-3-Instruct-8B-SimPO" },

xiamengzhou commented 2 months ago

Hi! It seems that you didn't add the auxiliary tokens into the generation template and the model thus output these tokens in the generation. We use this template for llama3 decoding, and using the template should solve this issue. Let me know if it works, and happy to help more if you have any further questions.

binzhwang commented 2 months ago

Thank you for your answer! But it doesn't work. I used this template and the 'configs.yaml' on alpaca_eval. The training process followed this 'llama-3-8b-instruct-simpo.yaml'. The issue occurs in about 5% of the cased, not all the time, which makes me confused about this phenomenon.

xiamengzhou commented 2 months ago

Hi, this is weird.. As you can see from our output file, none of the outputs contain <|start_header_id|>assistant<|end_header_id|>. Have you tried decoding with the released checkpoint (i.e., princeton-nlp/Llama-3-Instruct-8B-SimPO) for decoding? It should help us pinpoint if it is a training or decoding issue.

binzhwang commented 2 months ago

Yes, I decoded the officially released checkpoint and the original meta-llama-3-8B-instruct before for comparison and none of the outputs contain <|start_header_id|>assistant<|end_header_id|>.

yumeng5 commented 2 months ago

Hi,

Not sure what exactly caused the issue but a good sanity check is to look at the preprocessed prompts and responses. The relevant code is located here:

https://github.com/princeton-nlp/SimPO/blob/26c060cbfc42b35d1b63b6e5177e8a0e957f49a4/scripts/run_simpo.py#L111-L117

For example, you may print out the preprocessed results by adding the following lines

print(f"prompt: {example['text_prompt']}")
print(f"chosen: {example['text_chosen']}")
print(f"rejected: {example['text_rejected']}")

and the outputs of one example should look like the following:

prompt: <|begin_of_text|><|start_header_id|>user<|end_header_id|>                                                                             

let's play a game. I will give you the rules in the next prompt<|eot_id|>                                                                     
chosen: <|start_header_id|>assistant<|end_header_id|>                 

Sounds like fun! I'd love to play a game with you. Please go ahead and share the rules, and I'll do my best to understand and follow them. I'm ready when you are!<|eot_id|>
rejected: <|start_header_id|>assistant<|end_header_id|>                                                                                                   

That sounds like fun! I'm ready to play. Please go ahead and share the rules for the game.<|eot_id|>

Specifically, <|start_header_id|>assistant<|end_header_id|> should only be added to the start of responses and not the prompt. There was a similar issue where the assistant header was also added to the prompt, leading to repeated headers. You can find more details here: https://github.com/princeton-nlp/SimPO/issues/8#issue-2319547436.

I'm not sure if this applies to your case, but examining the preprocessed results could hopefully help clarify the issue.

Best, Yu

binzhwang commented 2 months ago

Hi! Thanks for your sincere answer! But it still does not work for me.

I debugged this code and printed the concatenated_batch information, which is followed: image where 1280006 means start_header_id, 78191 means 'assistant', and 1280007 means end_header_id.

I think those tokens with the ID above should be masked too. The version of trl I used is v0.8.6. Therefore, should it be set to -100? Is this your expected behavior or a bug?