Open Tejaswgupta opened 3 weeks ago
Hi @Tejaswgupta
We found that hyperparameter tuning is quite necessary for all models (Llama, Gemma, Mistral, etc.) and all methods (DPO, SimPO, etc.) we experimented with, so I don't think one can directly obtain the optimal hyperparameters without tuning for new models.
That said, we provided a guide to help tune hyperparameters more efficiently -- you can tune the learning rate first while keeping other hyperparameters (beta and gamma) fixed as their default value, and then tune gamma a bit (by tuning the gamma_beta_ratio while keeping beta fixed). This could hopefully reduce the number of trials required.
I hope this is helpful!
Best, Yu
Kudos to the great work so far. I saw the hyper-parameters for the mainstream models but is there any resource to find the optimal hyper-parameters for models like Qwen extensive without hit and trial?
Thanks in advance.