Open xffxff opened 8 months ago
I'm curious on the official response here.
My guess would be:
FYI If you want to fork, you can use completion-only training with minimal changes.
I'm curious on the official response here.
My guess would be:
- Currently packing does not work with completion-only training in TRL's implementation, which makes training much slower for training on massive datasets
- In my experience, completion-only training yielded worse performance on finetuning for new tasks, when evaluated on the new tasks specifically
FYI If you want to fork, you can use completion-only training with minimal changes.
Do u have any reference or evidence for worse performance on completion-only tuning for new tasks? I want to learn more!
Do u have any reference or evidence for worse performance on completion-only tuning for new tasks? I want to learn more!
Nope, no references other than trying it on an internal use case and seeing much worse eval results. I didn't look much further into it and just went back to packing without completion-only.
I still linked the docs because I'd encourage interested parties to try it out :)
I noticed that the alignment-handbook doesn't ignore the loss calculated from both the user and system inputs Based on my knowledge, many SFT choose to ignore these. I'm curious about the reasoning behind this difference.