princeton-nlp / SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
MIT License
684 stars 44 forks source link

Hyperparameters #66

Closed XYaoooo closed 1 month ago

XYaoooo commented 1 month ago

Hi, could i know your hyper parameters when training with DPO: batch size, beta, learning rate I train on 8 A100 with batch size per device = 16 (as you said in the paper, bz=128), it is out of memory. Best

xiamengzhou commented 1 month ago

Hi you should use gradient accumulation to reduce the effective training batch size. Please refer to this script for more details: https://github.com/princeton-nlp/SimPO/blob/main/training_configs/gemma-2-9b-it-simpo.yaml

XYaoooo commented 1 month ago

Got it. Thanks for your advice.