Add `run_orpo.py` - Githubissues

alvarobartt commented 6 months ago

Description

This PR adds the run_orpo.py Python script to fine-tune LLMs with the "to be released" trl.ORPOTrainer.

Besides that, some changes have been applied in the dataset formatting, to also support DPO/ORPO datasets formatted as prompt-chosen-rejected, and adding the orpo as a task in apply_chat_template.

Additionally, this PR adds the prompt filtering based on the length if provided among the model_args similarly to what's done in the official ORPO codebase for consistency when replicating their experiments.

Experiments

A raw version of the script has been ran, but more tests are needed, if there's an interesting use case I'm happy to collaborate for the release of run_orpo.py as recently done for both Zephyr Gemma #129 and StarChat 2 #135 🤗

Mistral-7B-v0.1 fine-tune with `argilla/distilabel-capybara-dpo-7k-binarized` as in https://huggingface.co/kaist-ai/mistral-orpo-capybara-7k

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml --num_processes 4 scripts/run_orpo.py recipes/mistral-capybara/orpo/config_full.yaml

HuggingFaceDocBuilderDev commented 6 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

nisten commented 5 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

The docs no longer live here :(

alvarobartt commented 5 months ago

The docs no longer live here :(

Yes, I believe that's expected! Anyway, thanks for mentioning, I'll submit a PR to add the documentation for ORPO, since it's missing now 👍🏻

huggingface / alignment-handbook

Add `run_orpo.py` #143

Description

Experiments

Mistral-7B-v0.1 fine-tune with `argilla/distilabel-capybara-dpo-7k-binarized` as in https://huggingface.co/kaist-ai/mistral-orpo-capybara-7k

huggingface / alignment-handbook

Add `run_orpo.py` #143

Description

Experiments

Mistral-7B-v0.1 fine-tune with argilla/distilabel-capybara-dpo-7k-binarized as in https://huggingface.co/kaist-ai/mistral-orpo-capybara-7k

Mistral-7B-v0.1 fine-tune with `argilla/distilabel-capybara-dpo-7k-binarized` as in https://huggingface.co/kaist-ai/mistral-orpo-capybara-7k