huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.28k stars 367 forks source link

Why does the alignment-handbook account for user & system Inputs in loss calculation #56

Open xffxff opened 8 months ago

xffxff commented 8 months ago

I noticed that the alignment-handbook doesn't ignore the loss calculated from both the user and system inputs Based on my knowledge, many SFT choose to ignore these. I'm curious about the reasoning behind this difference.

nathan-az commented 8 months ago

I'm curious on the official response here.

My guess would be:

FYI If you want to fork, you can use completion-only training with minimal changes.

MAOJIASONG commented 2 months ago

I'm curious on the official response here.

My guess would be:

  • Currently packing does not work with completion-only training in TRL's implementation, which makes training much slower for training on massive datasets
  • In my experience, completion-only training yielded worse performance on finetuning for new tasks, when evaluated on the new tasks specifically

FYI If you want to fork, you can use completion-only training with minimal changes.

Do u have any reference or evidence for worse performance on completion-only tuning for new tasks? I want to learn more!

nathan-az commented 2 months ago

Do u have any reference or evidence for worse performance on completion-only tuning for new tasks? I want to learn more!

Nope, no references other than trying it on an internal use case and seeing much worse eval results. I didn't look much further into it and just went back to packing without completion-only.

I still linked the docs because I'd encourage interested parties to try it out :)