princeton-nlp / SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
MIT License
681 stars 43 forks source link

Hyper-parameter tuning for other models #69

Open Tejaswgupta opened 3 weeks ago

Tejaswgupta commented 3 weeks ago

Kudos to the great work so far. I saw the hyper-parameters for the mainstream models but is there any resource to find the optimal hyper-parameters for models like Qwen extensive without hit and trial?

Thanks in advance.

yumeng5 commented 1 week ago

Hi @Tejaswgupta

We found that hyperparameter tuning is quite necessary for all models (Llama, Gemma, Mistral, etc.) and all methods (DPO, SimPO, etc.) we experimented with, so I don't think one can directly obtain the optimal hyperparameters without tuning for new models.

That said, we provided a guide to help tune hyperparameters more efficiently -- you can tune the learning rate first while keeping other hyperparameters (beta and gamma) fixed as their default value, and then tune gamma a bit (by tuning the gamma_beta_ratio while keeping beta fixed). This could hopefully reduce the number of trials required.

I hope this is helpful!

Best, Yu