huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
9.61k stars 1.21k forks source link

[CLI] Extend training support to all trainers #2101

Open lewtun opened 1 week ago

lewtun commented 1 week ago

Feature request

The CLI currently supports training models with SFT/DPO/KTO: https://github.com/huggingface/trl/blob/6859e048da601fec181997a324e7b351fc997a33/trl/commands/cli.py#L24

It would be good to extend this support to all trainers so that we have a consistent API and also learn which parts of our scripts need refactoring to support this usage.

This could be tackled in separate PRs to keep things lightweight, and I'll track here the trainers in terms of priority to add (based on Hub usage):

Motivation

It is somewhat annoying that one cannot train a model through the CLI as this is helpful for fast debugging / iterations.

Your contribution

Happy to open PRs, but this could be a good first issue for new contributors!

qgallouedec commented 1 week ago

Duplicate #1811, closing it in favour of this one

grumpyp commented 6 days ago

Hi, it looks like it'd be just the extension of the SUPPORTED_COMMANDS constant?

as for instance orpo would already be there https://github.com/huggingface/trl/blob/main/examples/scripts/orpo.py

I didn't check for the others but compared the scripts of kto with orpo for instance.

lewtun commented 6 days ago

@grumpyp yes I think this is all one needs - a better solution would be to have an automated way to populate this constant by globbing all the _trainer.py files (e.g. make cli_commands). This way, anytime someone adds a new trainer, we automatically get support for it in the CLI :)

grumpyp commented 6 days ago

@grumpyp yes I think this is all one needs - a better solution would be to have an automated way to populate this constant by globbing all the _trainer.py files (e.g. make cli_commands). This way, anytime someone adds a new trainer, we automatically get support for it in the CLI :)

yes definitely! If you want, assign the issue to me please. I'll try to get a PR out today.

lewtun commented 4 days ago

assigned! thanks for the offer to help 🤗

grumpyp commented 3 days ago

hi @lewtun

I didn't want to manipulate py-files via the Makefile so I went a slighly different approach.

It now creates the commands dynamically using a utility function which is cached. EDIT: I deleted the caching as it's not executed anywhere else and terminated after running so the cache is not saved.

Let me know if that works for you or if it needs changes. Thanks for the opportunity to contribute.