2162 introduced a breaking change on all trainers except DPO and SFT (tokenizer arg replaced by processing_class). We've had feedback that this change was too abrupt, so we're reintroducing this argument with an extended timeline for its removal and also clarifying our removal strategy.
This PR will be part of a patch release for v0.12
Here is the proposed schedule depending of the usage for each trainer:
Trainer
Num models on the Hub
Argument removed in version
SFT
15,097
0.16
DPO
3,053
0.16
PPO
441
0.15
ORPO
320
0.15
Reward
270
0.15
KTO
80
0.14
CPO
26
0.14
Online DPO
14
0.14
RLOO
4
0.14
XPO
3
0.13
Nash
1
0.13
GKD
1
0.13
Iterative SFT
1
0.13
BCO
0
0.13
cc @muellerzr
Related: #2290 #2226 #2207 #2218
Before submitting
[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
What does this PR do?
2162 introduced a breaking change on all trainers except DPO and SFT (
tokenizer
arg replaced byprocessing_class
). We've had feedback that this change was too abrupt, so we're reintroducing this argument with an extended timeline for its removal and also clarifying our removal strategy.This PR will be part of a patch release for v0.12
Here is the proposed schedule depending of the usage for each trainer:
cc @muellerzr
Related: #2290 #2226 #2207 #2218
Before submitting
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.