Collection of Comparison Data for Reward Model Training

redhat-et / foundation-models-for-documentation

Improve ROSA customer experience (and customer retention) by leveraging foundation models to do “gpt-chat” style search of Red Hat customer documentation assets.

Other

24 stars 11 forks source link

Collection of Comparison Data for Reward Model Training #58

Closed suppathak closed 3 months ago

suppathak commented 10 months ago

The process of training a reward model for Reinforcement Learning from Human Feedback (RLHF) involves gathering comparison data to accurately assess the quality of prompt-response pairs. This comparison data aids in fine-tuning a Large Language Model (LLM) using Reinforcement Learning (RL) to generate responses that align with human preferences. This issue aims to guide the development of a systematic workflow for collecting comparison data using the Argilla platform.

Shreyanand commented 3 months ago

Addresed in #59