Closed alvarobartt closed 5 months ago
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
The docs no longer live here :(
The docs no longer live here :(
Yes, I believe that's expected! Anyway, thanks for mentioning, I'll submit a PR to add the documentation for ORPO, since it's missing now 👍🏻
Description
This PR adds the
run_orpo.py
Python script to fine-tune LLMs with the "to be released"trl.ORPOTrainer
.Besides that, some changes have been applied in the dataset formatting, to also support DPO/ORPO datasets formatted as
prompt-chosen-rejected
, and adding theorpo
as a task inapply_chat_template
.Additionally, this PR adds the prompt filtering based on the length if provided among the
model_args
similarly to what's done in the official ORPO codebase for consistency when replicating their experiments.Experiments
A raw version of the script has been ran, but more tests are needed, if there's an interesting use case I'm happy to collaborate for the release of
run_orpo.py
as recently done for both Zephyr Gemma #129 and StarChat 2 #135 🤗Mistral-7B-v0.1 fine-tune with
argilla/distilabel-capybara-dpo-7k-binarized
as in https://huggingface.co/kaist-ai/mistral-orpo-capybara-7k