martin-wey / CodeUltraFeedback

CodeUltraFeedback: aligning large language models to coding preferences
https://arxiv.org/abs/2403.09032
MIT License
64 stars 5 forks source link

How to train #2

Closed renmengjie7 closed 2 months ago

renmengjie7 commented 2 months ago

How to use your code in src/dpo to train model ? Can you provide an example script ?

martin-wey commented 2 months ago

Hi, we will soon add a detailed readme on how to run the scripts.

To train a model using DPO:

export PYTHONPATH=$PYTHONPATH:$(pwd)
python src/dpo/run_dpo.py src/dpo/recipes/config_qlora_dpo_codellama_base.yaml

You can check for all the arguments in the config_qlora_dpo_codellama_base.yaml file. If you have your own model checkpoint locally, you can run:

python src/dpo/run_dpo.py src/dpo/recipes/config_qlora_dpo_codellama.yaml \
  --model_name_or_path={path_to_your_model_checkpoint}

Make sure to change the chat_template argument accordingly, depending on your model type/family :)

SFT training works similarly with the recipe file config_qlora_sft_codellama.yaml.

renmengjie7 commented 2 months ago

Thank you very much for your quick reply !