qgallouedec commented 1 week ago

What does this PR do?

2162 introduced a breaking change on all trainers except DPO and SFT (`tokenizer` arg replaced by `processing_class`). We've had feedback that this change was too abrupt, so we're reintroducing this argument with an extended timeline for its removal and also clarifying our removal strategy.

This PR will be part of a patch release for v0.12

Here is the proposed schedule depending of the usage for each trainer:

Trainer	Num models on the Hub	Argument removed in version
SFT	15,097	0.16
DPO	3,053	0.16
PPO	441	0.15
ORPO	320	0.15
Reward	270	0.15
KTO	80	0.14
CPO	26	0.14
Online DPO	14	0.14
RLOO	4	0.14
XPO	3	0.13
Nash	1	0.13
GKD	1	0.13
Iterative SFT	1	0.13
BCO	0	0.13

cc @muellerzr

Related: #2290 #2226 #2207 #2218

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you read the contributor guideline, Pull Request section?
[ ] Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
[ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev commented 1 week ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

muellerzr commented 1 week ago

Beautiful! 🔥

huggingface / trl

👈 Add `tokenizer` arg back and add deprecation guidelines #2348

What does this PR do?

2162 introduced a breaking change on all trainers except DPO and SFT (`tokenizer` arg replaced by `processing_class`). We've had feedback that this change was too abrupt, so we're reintroducing this argument with an extended timeline for its removal and also clarifying our removal strategy.

Before submitting

Who can review?

huggingface / trl

👈 Add `tokenizer` arg back and add deprecation guidelines #2348

What does this PR do?

2162 introduced a breaking change on all trainers except DPO and SFT (tokenizer arg replaced by processing_class). We've had feedback that this change was too abrupt, so we're reintroducing this argument with an extended timeline for its removal and also clarifying our removal strategy.

Before submitting

Who can review?

2162 introduced a breaking change on all trainers except DPO and SFT (`tokenizer` arg replaced by `processing_class`). We've had feedback that this change was too abrupt, so we're reintroducing this argument with an extended timeline for its removal and also clarifying our removal strategy.