huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
10.13k stars 1.28k forks source link

👈 Add `tokenizer` arg back and add deprecation guidelines #2348

Closed qgallouedec closed 1 week ago

qgallouedec commented 1 week ago

What does this PR do?

2162 introduced a breaking change on all trainers except DPO and SFT (tokenizer arg replaced by processing_class). We've had feedback that this change was too abrupt, so we're reintroducing this argument with an extended timeline for its removal and also clarifying our removal strategy.

This PR will be part of a patch release for v0.12

Here is the proposed schedule depending of the usage for each trainer:

Trainer Num models on the Hub Argument removed in version
SFT 15,097 0.16
DPO 3,053 0.16
PPO 441 0.15
ORPO 320 0.15
Reward 270 0.15
KTO 80 0.14
CPO 26 0.14
Online DPO 14 0.14
RLOO 4 0.14
XPO 3 0.13
Nash 1 0.13
GKD 1 0.13
Iterative SFT 1 0.13
BCO 0 0.13

cc @muellerzr

Related: #2290 #2226 #2207 #2218

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev commented 1 week ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

muellerzr commented 1 week ago

Beautiful! 🔥