huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
9.56k stars 1.19k forks source link

change the `process` function in the example of DPO #1753

Closed AIR-hl closed 3 months ago

AIR-hl commented 3 months ago

1752 #1541

younesbelkada commented 3 months ago

cc @kashif

HuggingFaceDocBuilderDev commented 3 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

AIR-hl commented 3 months ago

thanks! nice catch

@kashif Hi! I just modify the codes of dpo example, but there seems to be an error related to sft in checks image

vwxyzjn commented 3 months ago

LGTM.