huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.2k stars 357 forks source link

Reward Modeling Support #109

Open agi-piggy opened 5 months ago

agi-piggy commented 5 months ago

Hi the team, great work! I wonder whether there will be demo / example about training reward models in multi-GPU env or deepspeed seetings?

Thanks!