huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.45k stars 385 forks source link

Question about DPO learning rate - comparison to neural-chat-7b-v3 training #25

Open sebastianschramm opened 9 months ago

sebastianschramm commented 9 months ago

The learning rate default in the dpo recipe config is set to 5e-7 and https://huggingface.co/Intel/neural-chat-7b-v3 was trained with a learning rate of 1e-4 (using of course a different data set https://huggingface.co/datasets/Open-Orca/SlimOrca).

However, I am wondering about the significant difference in lr and yet both models seem to perform well. Any insights about that, that you can share? Thank you